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(57) Abstract: EKG sensors 
((150) are placed on a patient 
(140) to receive electrocardiogram 
(EKG) recording signals, which are 
typically combinations of original 
signals from different sources, such 
as pacemaker signals, QRS complex 
signals, and irregular oscillatory 
signals that suggest an arrhythmia 
condition. A computing module 
(120) uses independent component 
analysis to separate the recorded 
EKG signals. The separated signals 
are displayed to help physicians 
to analyze heart conditions and 
to identify probably locations of 
abnormal heart conditions. At least 
a portion of the separated signals 
can be further displayed in a chaos 
phase space portrait to help detect 
abnormality in heart conditions. 
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SYSTEM AND METHOD FOR SEPARATING CARDIAC SIGNALS 



PacjcgrQwci of foe InveintiQn 

Field of the Invention 

The present invention relates to medical devices for recording cardiac signals and 
5 separating the recorded cardiac signals. 
Description of foe Related Art 

Electrocardiogram (EKG) recording is a valuable tool for physicians to study patient heart 
conditions. In a typical 12-lead arrangement, up to 12 sensors are placed on a subject's chest or 
abdomen and limbs to record the electric signals from the beating heart. Each sensor, along with a 
10 reference electrode, form a separate channel that produces an individual signal. The signals from 
the different sensors are recorded on an EKG machine as different channels. The sensors are 
usually unipolar or bipolar electrodes or other devices suitable for measuring the electrical potential 
on the surface of a human body. Since different parts of the heart, such as the atria and ventricles, 
produce different spatial and temporal patterns of electrical activity on the body surface, the signals 
15 recorded on the EKG machine are useful for analyzing how well individual parts of the heart are 
functioning. 

A typical heartbeat signal has several well-characterized components. The first component 
is a small hump in the beginning of a heartbeat called the M P-Wave". This signal is produced by the 
right and left atria. There is a flat area after the P-Wave which is part of what is called the PR 

20 Interval. During the PR interval the electrical signal is traveling through the atrio-ventricular node 
(AV) node. The next large spike in the heartbeat signal is called the "QRS Complex." The QRS 
Complex is tall, spikey signal produced by the ventricles. Following the QRS complex is another 
smaller bump in the signal called the "T-Wave," which represents the electrical resetting of the 
ventricles in preparation for the next signal. When the heart beats continuously, the P-QRS-T waves 

25 repeat over and over. 

Many publications have described studying cardiac signals and detecting abnormal heart 
conditions. Sample publications include U.S. Patent Publication No. 20020052557; Podrid & 
Kowey, Cardiac Arrhythmia: Mechanisms. Diagnosis, and Management Lippincott Williams & 
Wiikins Publishers (2nd edition, August 15, 2001); Marriott & Conover, Advanced Concepts in 

30 Arrhythmias . Mosby Inc. (3nd edition, January 15, 1998); and Josephson, M.E., Clinical Cardiac 
Electrophvsiology: Techniques and Interpretations , Lippincott Williams & Wiikins Publishers; 
ISBN (3rd edition, December 15, 2001). 

Unfortunately, although EKG signals have been studied for decades, they are difficult to 
assess because EKG signals recorded at the surface are mixtures of signals from multiple sources. 

35 Typically, it is relatively straightforward to measure the shape of the QRS complex since this signal 
is so strong. However, irregular shaped P-wave or T-wave signals, along with weak irregular 



WO 03/003905 PCT/US02/21277 

oscillatory signals that suggest a heart arrhythmia are often masked by large pacemaker signals, or 
the strong QRS complex signals. Thus, it can be very difficult to isolate small irregular oscillatory 
signals and to identify arrhythmia conditions. 

In addition, atrial and ventricular signals are sometimes undesirably superimposed over one 
5 another. In many cases, diagnosis of disease states requires these signals to be separated from one 
another. For example, it might be desirable to separate ? wave signals from QRS complex signals, 
so that signals originating in an atrium are isolated from signals representing concurrent activities in 
the ventricle. 

In some practices the EKG signals are electronically "filtered" by excluding signals of 

10 certain frequencies. The signals are also "averaged" to remove largely random or asynchronous 
data, which is assumed to the meaningless "noise." The filtering and averaging methods 
irreversibly eliminate portions of the recorded signals. In addition, it is not proven whether the 
more random data is truly "noise" and truly meaningless. It might be that the signals that are 
removed are indicative of a disease state in a patient Another method as disclosed in U.S. Patent 

15 No. 6,308,094 entitled "System for prediction of cardiac arrhythmias" uses Karhunen Loeve 
Transformation to decompose or compress cardiac signals into elements that are deemed 
"significant." As a result the information that are deemed "insignificant" are lost. 

Compared to other signal separation applications, separating EKG recording signals 
presents additional challenges. For example, the sources are not always stationary since the heart 

20 chambers contract and expand during beating. Additionally, the activity of a single chamber may 
be mistaken for multiple sources because of the presence of moving waves of electrical activity 
across the heart. If electrodes are not securely attached to the patient, or if the patient moves (for 
example older patients may suffer from uncontrolled jittering), the movement of the electrodes also 
undesirably generates signals. In addition, multiple signals can be sensed by the EKG which are 

25 unrelated to the cardiac signature, such as myopotentials, i.e., electrical signals from muscles other 
than the heart. 

There has been disclosure of cardiac rhythm management systems that store of list of 
triggers. U.S. Patent No. 6,400,982 entitled "Cardiac rhythm management system with arrhythmia 
prediction and prevention" discloses such a system. If a trigger matches detected cardiac signals 
30 from a patient, the system calculates the probability of arrhythmia and activates a prevention 
therapy to the patient. However the cardiac signals are in fact mixtures of signals from multiple 
sources, and the signals that are important for arrhythmia detection can be masked by other signals. 
It is therefore desirable to separate the cardiac signals used in the cardiac rhythm management 
systems. 

35 Independent component analysis (ICA) is a technique for separating mixed source signals 

(components) which are presumably independent from each other. In its simplified form, 
independent component analysis operates a "un-mixing" matrix of weights on the mixed signals, for 
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example multiplying the matrix with the mixed signals, to produce separated signals. The weights 
are assigned initial values, and then adjusted to minimize information redundancy in the separated 
signals. Because this technique does not require information on the source of each signal, it is 
known as a "blind source separation" method. Blind separation problems refer to the idea of 
5 separating mixed signals that come from multiple independent sources. Although there are many 
ICA techniques currently known, most have evolved from the original work described in U.S. 
Patent No. 5,706,402 issued on January 6, 1998. Additional references of ICA and blind source 
separation can be found in, for example, A. J. Bell and TJ Sejnowski, Neural Computation 7:1 129- 
1159 (1995)); Te-Won Lee, Independent Component Analysis: Theory and Applications . Kluwer 

10 Academic Publishers, Boston, September 1998, Hyvarinen et al., Independent Component Analysis . 
1st edition (Wiley-Interscience, May 18, 2001); Mark Girolami, Self-Organizing Neural Networks: 
Independent Component Analysis and Blind Source Separation (Perspectives in Neural Computing) 
(Springer Verlag, September 1999); and Mark Girolami (Editor), Advances in Independent 
Component Analysis (Perspectives in Neural Computing) (Springer Verlag August 2000). Single 

15 value decomposition algorithms have been disclosed in Adaptive Filter Theory by Simon Haykin 
(Third Edition, Prentice-Hall (NJ), (1996). 

There has been suggestion to use chaos theory to analyze cardiac signals to detect abnormal 
heart conditions. Sample disclosures include U.S. Patent Nos. 5,439,004, 5,342,401, 5,447,520 and 
5,456,690; PCT application Nos. WO02/34123 and WO0224276; Smith et al. Electrical Alternans 

20 and Cardiac Electrical Instability. Circulation, Vol. 77, No. 1, pp. 110-121 (January 1988). Other 
approaches are disclosed in U.S. Patent No. 5,447,520 issued to Spano, et al. and U.S. Patent No. 
5,201,321 issued to Fulton. Chaos theory is defined as the study of complex nonlinear dynamic 
systems. Complex implies just that, nonlinear implies recursion and higher mathematical 
algorithms, and dynamic implies non-constant and non-periodic. Thus chaos theory is, very 

25 generally, the study of changing complex systems based on mathematical concepts of recursion, 
whether in the form of a recursive process or a set of differential equations modeling a physical 
system. 

When a bounded chaotic system has some kind of long-term pattern, but the pattern is not a 
simple periodic oscillation or orbit, then the system has a "Strange Attractor". If the system's 

30 behavior is plotted in a graph over an extended period patterns can be discovered that are not 
obvious in the short term. In addition, in these types of systems, no matter what the initial 
conditions are, usually the same pattern is found to emerge. The area for which this recurring 
pattern holds true is called the "basin of attraction" for the attractor. Chaos theory methods have 
been described in, for example, N. H. Packard, J. P. Crutchfield, J. Doyne Farmer, and R. S. Shaw, 

35 Geometry of a Time Series . Physical Review Letters, 47 (1980), p. 712; F. Takens, Detecting 
Strange Attractors in Turbulence in Lecture Notes in Mathematics 898, D. A. Rand and L. S. 
Young, eds., (Berlin: Springer-Verlag, 1981), p. 336; and J. P. Crutchfield, J. Doyne Farmer, N. H. 
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Packard, and R. S. Shaw, On Determining the Dimen sion of Chaotic Flows. Physica 3D, (1981), 
pp. 605-17. 

For all of these reasons, what is needed in the art is a system that can accurately separate 
medical signals from one another in order to diagnose disease states. 
5 Summary of the Invention 

The present application discloses systems and methods for using independent component 
analysis to determine the existence and location of anomalies such as arrhythmias of a heart. The 
disclosed systems and methods can be applied to suggest the location of atrial fibrillation, and to 
locate arrhythmogenic regions of a chamber of the heart using heart cycle signals measured from a 

10 body surface of the patient. Non-invasive localization of the ectopic origin allows focal treatment to 
be quickly targeted to effectively inhibit these complex arrhythmias without having to rely on 
widespread and time consuming sequential searches or on massively invasive simultaneous 
intracardiac sensor technique. The effective localization of these complex arrhythmias can be 
significantly enhanced by using independent component analysis to separate superimposed heart 

1 5 cycle signals originating from differing chambers or regions of the heart tissue. In addition, the 
signals that are separated by ICA are preferably also analyzed by plotting them on a chaos phase 
space portrait. 

One aspect of the invention relates to a medical system for separating cardiac signals. This 
aspect includes a receiving module to receive recorded cardiac signals from medical sensors, a 

20 computing module to separate the received signals using independent component analysis to 
produce separated signals, and a display module to display the separated signals. 

Another aspect of the invention relates to a method of detecting arrhythmia in a patient. 
The method includes placing EKG sensors on a patient to produce recorded EKG signals, sending 
the recorded signals to a computing module to separate the recorded signals into separated signals 

25 using independent component analysis, and reviewing a display of the separated signals to 
determine the existence of aiThythmia in the patient. In a preferred embodiment, each component 
of separated signals corresponds to a channel of recorded signals and its sensor location, therefore 
when the one or more components of separated signals that suggest arrhythmia are detected, the 
corresponding one or more sensor locations also suggest the location of arrhythmia. 

30 Yet another aspect of the invention relates to a cardiac rhythm management system. The 

system includes a cardiac signal recording module to record cardiac signals of a patient, a 
computing module to separate the recorded signals into separated signals using independent 
component analysis, and a detection module to detect or to predict an abnormal condition based on 
analyzing the separated signals. The system also includes a treatment module to treat the patient or 

35 a warning module to issue a warning when the abnormal condition is detected or predicted. 

Other aspects and embodiments of the invention are described below in the detailed 
description section or defined by the claims. 
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Brief Description of the Drawings 
FIGURE 1 is a diagram of a EKG system according to one embodiment of the invention. 
FIGURE 2 is a flowchart illustrating one embodiment of a process for separating cardiac 

signals. 

5 FIGURE 3A is a sample chart of recorded EKG signals. 

FIGURE 3B is a sample chart of separated EKG signals. 

FIGURE 3C is a sample chart of one component of separated signals back projected on the 
recorded signals. 

FIGURE 4 A is a chaos phase space portrait of three components of separated EKG signals 
10 of a healthy subject. 

FIGURE 4B is a chaos phase space portrait of three components of separated EKG signals 
of a subject with an abnormal heart condition. 

Detailed Description of the Preferred Embodiment 
Embodiments of the invention relate to a system and method for accurately separating 
15 medical signals in order to determine disease states in a patient. In one embodiment, the system 
analyzes EKG signals in order to determine whether a patient has a heart ailment or irregularity. As 
discussed in detail below, embodiments of the system utilize the techniques of independent 
component analysis to separate the medical signals from one another. 

In addition to the signal separation technique, embodiments of the invention also relate to 
20 systems and methods that first separate signals using ICA, and then perform an analysis on a 
specific isolated signal, or set of isolated signals, using a "chaos" analysis. As described earlier, 
Chaos theory (also called nonlinear dynamics) studies patterns that are not completely random, but 
cannot be determined by simple formulas. Because cardiac signals are typically non-random, but 
cannot be easily described by a simple formula, Chaos theory analysis as described below provides 
25 an effective tool to analyze these signals and determine disease states. 

Accordingly, once the signals are separated using ICA, they can be plotted to produce a 
chaos phase space portrait. By reviewing the patterns in the phase space portrait, for example 
reviewing the existence and location of one or more attractors, or comparing established health 
patterns and established abnormal patterns with the patterns of the patient, a user is able to assess 
30 the likelihood of abnormality in the signals, which indicate disease conditions in the patient. 

FIGURE 1 is a diagram of an EKG system that includes a computing module for signal 
separation according to one embodiment of the present invention. As shown in FIGURE 1, 
electrode sensors 150 are placed on the chest and limb of a patient 140 to record electric signals. 
The electrodes send the recorded signals to a receiving module 110 of the EKG system 100. After 
35 optionally performing signal amplification, analog-to-digital conversion or both, the receiving 
module 1 10 sends the received signals to a computing module 120 of the EKG system 100. The 
computing module 120 uses an independent component analysis method to separate the recorded 
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signals to produce separated signals. The independent component analysis method has been 
described in detail in the Appendix and below with respect to Figure 2. 

The computing module 120 can be implemented in hardware, software, or a combination of 
both. It can be located physically within the EKG system 100 or connected to the recorded signals 
5 received by the EKG system 100. A displaying module 130, which includes a printer or a monitor, 
displays the separated signals on paper or on screen. The displaying module 130 can be located 
within the EKG system 100 or connected to it. Optionally, the displaying module 130 also displays 
the recorded signals on paper or on screen. In one embodiment, the displaying module also 
displays some components of the separated signals in a chaos phase space portrait. 

10 In one embodiment, the EKG system 100 also includes a database (not shown) that stores 

recognized EKG signal triggers and corresponding diagnosis. The triggers refer to conditions that 
indicate the likelihood of arrhythmia. For example, triggers can include sinus beats, premature 
sinus beats, beats following long sinus pauses, long-short beat sequences, R on T-wave beats, 
ectopic ventricular beats, premature ventricular beats, and so forth. Triggers can include threshold 

15 values that indicate arrhythmia, such as threshold values of ST elevations, heart rate, increase or 
decrease in heart rate, late-potentials, abnormal autonomic activity, and so forth. A left bundle- 
branch block diagnosis can be associated with triggers such as the absence of q wave in leads I and 
V6, a QRS duration of more than 120 msec, small notching of R wave, etc. 

Triggers can be based on a patient's history, for example the percentage of abnormal beats 

20 detected during an observation period, the percentage of premature or ectopic beats detected during 
an observation period, heart rate variation during an observation period, and so forth. Triggers may 
also include, for example, the increase or decrease of ST elevation in beat rate, the increase in 
frequency of abnormal or premature beats, and so forth. 

A matching module (not shown) attempts to match the separated signals with one or more 

25 of the stored triggers. If a match is found, the matching module displays the matched 
corresponding diagnosis, or sends a warning to a healthcare worker or to the patient. Methods such 
as computer-implemented logic rules, classification trees, expert system rules, statistical or 
probability analysis, pattern recognition, database queries, artificial intelligence programs and 
others can be used to match the separated signals with stored triggers. 

30 FIGURE 2 is a flowchart illustrating one embodiment of a process for separating EKG 

signals. The process starts from a start block 202, and proceeds to a block 204, where the 
computing module 120 of the EKG system 100 receives the recorded signals Xj from the electrode 
sensors, with J being the number of channels. Prior to processing, the signals can be amplified to 
strengths suitable for computer processing. Analog-to-digital conversion of signals can also be 

35 performed. 

From the block 204, the process proceeds to a block 206, where the initial values for a "un- 
mixing" matrix of scaling weights Wy are selected. In one embodiment, the initial values for a 
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matrix of initial weights Wj 0 are also selected. The process then proceeds to a block 208, where a 
plurality of training signals Yj are produced by operating the matrix on the recorded signals. In a 
preferred embodiment, the training signals are produced by multiplying the matrix with the 
recorded signals such that Yj = Wjj * Xj. In one embodiment, the initial weights W {0 are included 
5 such that Y { = Wy * Xj + W i0 . The process proceeds from the block 208 to a block 210, wherein the 
scaling weights Wjj and optionally the initial weights W i0 are adjusted to reduce the information 
redundancy among the training signals. Methods of adjusting the weights have been described in 
the Appendix. 

The process proceeds to a decision block 212, where the process determines whether the 

10 information redundancy has been reduced to a satisfactory level. The criteria for the determination 
has been described in the Appendix. If the process determines that information redundancy among 
the training signals has been reduced to a satisfactory level, then the process proceeds to a block 
214, where the training signals are displayed as separated signals Y i} with I being the number of 
components for the separated signals. In a preferred embodiment, I, the number of components of 

15 separated signals, is equal to J, the number of channels of recorded signals. Otherwise the process 
returns from the block 212 to the block 208 to again adjust the weights. From the block 214, the 
process proceeds to an end block 216. 

For the un-mixing matrix W with the final weight values, its rows represent the time 
courses of relative strengths/activity levels (and relative polarities) of the respective separated 

20 components. Its weights give the surface topography of each component, and provide evidence for 
the components' physiological origins. For the inverse of matrix W 5 its columns represent the 
relative projection strengths (and relative polarities) of the respective separated components onto 
the channels of recorded signals. The back projection of the ith independent component onto the 
recorded signal channels is given by the outer product of the ith row of the separated signals matrix 

25 with the ith column of the inverse un-mixing matrix, and is in the original recorded signals. Thus 
cardiac dynamics or activities of interest accounted for by single or by multiple components can be 
obtained by projecting one or more ICA components back onto the recorded signals, X =W~ 1 * Y, 
where Y is the matrix of separated signals, Y = W * X. 

The separated signals are determined by the ICA method to be statistically independent and 

30 are presumed to be from independent sources. Regardless of whether there is in fact some 
dependence between the separated EKG signals, test results show that the separated signals provide 
a beneficial perspective for physicians to detect and to locate the abnormal heart conditions of a 
patient. 

In a preferred embodiment, time-delay between source signals is ignored. Since the 
35 sampling frequencies of cardiac signals are in the relatively low 200-500 Hz range, the effect of 
time-delay can be neglected. 
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Improved methods of ICA can be used to speed up the signal separation process. In one 
embodiment, a generalized Gaussian mixture model is used to classify the recorded signals into 
mutually exclusive classes. The classification methods have been disclosed in U.S. Patent 
Application No. 09/418,099 titled "Unsupervised adaptation and classification of multiple classes 
5 and sources in blind source separation" and PCT Application No. WOO 127874 titled "Unsupervised 
adaptation and classification of multi-source data using a generalized Gaussian mixture model." In 
another embodiment, the computing module 120 incorporates a priori knowledge of cardiac 
dynamics, for example supposing separated QRS components to be highly kurkotic and (ar)rythmic 
component(s) to be sub-Gaussian. ICA methods with incorporated a priori knowledge have been 

10 disclosed in T-W. Lee, M. Girolami and T.J. Sejnowski, Independent Component Analysis using an 
Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources, Neural 
Computation, 1999, Vol.1 1(2): 417-441. 

FIGURE 3 A illustrates a ten-second portion of 12 channels of signals that were gathered as 
part of an EKG recording. The horizontal axis in FIGURE 3A represents time progression of ten 

15 seconds. The vertical axis represents channel numbers 1 to 12. The signals of FIGURE 3 A are, in 
this case, from a patient that provided a mixture of multiple signals, including QRS complex 
signals, pacemaker signals, multiple oscillatory activity signals, and noise. However, because these 
signals were all occurring simultaneously, they cannot be easily separated from one another using 
conventional EKG equipment. 

20 In contrast, FIGURE 3B illustrates output signals separated from the mixture signals of 

FIGURE 3A, according to one embodiment of the present invention. As above, the horizontal axis 
in FIGURE 3B represents time progression of ten seconds and the vertical axis represents the 
separated components 1 to 12. The separated signals in FIGURE 3B are displayed as components 1 
to 12 corresponding to the channels 1 to 12 in FIGURE 3 A, so that a physician can identify a 

25 separated signal as relating to its respective recorded signal's corresponding sensor location on the 
patient body. For example, in a standard 12-lead arrangement, leads II, III and AvF represent 
signals from the inferior region. Leads VI, V2 represent signals from the septal region. Leads V5, 
V6, 1, and a VL represent signals from the lateral heart. Right and posterior heart regions typically 
require special lead placement for recording. To better identify the location of a heart condition, 

30 more than 12 leads can be used. For example, 20, 30, 40, 50, or even hundreds of sensors can be 
placed on various portions of a patient's torso. Fewer than 12 leads can also be used. The sensors 
are preferably non-invasive sensors located on the patient's body surface, but invasive sensors can 
also be used. With separated signals each corresponding to one of the locations, a physician can 
review the signals and detect abnormalities that correspond to the respective locations. 

35 As shown in FIGURE 3B, the component #1 represents the pacemaker signals and the early 

part of QRS complex signals. The component #2 represents major portions of later parts of the 
QRS complex signals. QRS complex signals represent the depolarization of the left ventricle. The 
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component #10 represents atrial fibrillation (a type of arrhythmia) signals. Therefore atrial 
fibrillation is predicted to be located at the sensor location that corresponds to channel #10. 
Although components #1 and #10 contain similar frequency contents of oscillatory activity between 
heart beats, they capture activities from different spatial locations. 
5 For EKG signals, we discovered that the signals separated using ICA are usually more 

independent from each other and have less information redundancy than signals that have not been 
processed through ICA. Compared to the recorded signals, the separated signals usually better 
represent the signals from the original sources of the patient's heart. In addition to arrhythmia, the 
separated cardiac signals can also be used to help detect other heart conditions. For example, the 

10 separated signals especially the separated QRS complex signals can be used detect premature 
ventricular contraction. The separated signals especially the separated Q wave signals can be used 
to detect myocardial infarction. Separating the EKG signals, especially separating the QRS complex 
and T wave signals, can help distinguish left and right bundle branch block. 

Of course, the disclosed system and method are not limited to detecting arrhythmia, or any 

1 5 particular type of disease state. Embodiments of the invention include all methods of analyzing 
medical signals using ICA. For example, when a pregnant woman undergoes EKG recording, the 
heart signals from the woman and from the fetus(es) can be separated. 

The separated cardiac signals can be characterized as non-random but not easily 
deterministic, which make them suitable subjects for chaotic analysis. As mentioned above, chaos 

20 theory (also called nonlinear dynamics) studies patterns that are not completely random but cannot 
be determined by simple formulas. The separated signals can be plotted to produce a chaos phase 
space portrait. By reviewing the patterns in the phase space portrait, including the existence and 
location of one or more attractors, a user is able to assess the likelihood of abnormality in the 
signals, which indicate disease conditions in the patient. 

25 In a preferred embodiment, the QRS complex signals are separated into three different 

components, with each component representing a portion of the QRS complex. The 3 components 
are 3 data sets that are found to be temporally statistically independent using independent 
component analysis. Using the three components, a 3-dimensional phase space portrait of QRS 
complex can be displayed to show the trajectory of the three components. 

30 FIGURE 3C is a sample chart of the component #10 of separated signals (as shown in 

FIGURE 3B) back projected onto the recorded signals of FIGURE 3 A. The separated signals of 
component #10, which indicate arrhythmia, is identified by reference number 302 in FIUGRE 3C. 
The 12 channels of recorded signals are identified by reference number 304 for ease of 
identification. FIGURE 3C therefore allows direct visual comparison of a separated component 

35 against channels of recorded signals. The back projections of cardiac dynamics allow us to exam 
the amount of information accounted for by single or by multiple components in the recorded 
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signals and to confirm the components' physiological meanings suggested by the surface 
topography (the aforementioned inverse of columns of the un-mixing matrix). 

FIGURE 4A illustrates the phase space portrait of the EKG recording of a healthy subject. 
FIGURE 4B illustrates the phase space portrait of the EKG recording of an atrial fibrillation patient. 
5 In FIGURES 4A and 4B, the x, y, and z axis represent the amplitudes of the 3 QRS components. 
The separated signals' values over time are plotted to produce the phase space portraits. In the 
healthy EKG recording of FIGURE 4A, the dense cluster 402 indicates the existence of an attractor 
that attracts the signal values to the region of the dense cluster 402. The dense cluster 402 
represents the most frequent occurrences of the signals. In the atrial fibrillation patient EKG 

10 recording of FIGURE 4B, an additional loop 404, which is not part of the dense cluster 402, is 
below the attractor and the dense cluster 402 and closer to the base plane than the dense cluster 402. 
This additional loop 404 is presumably due to the oscillatory activity in the baseline portions of the 
EKG signals. The separated component #10 signal that indicate an arrhythmia condition is 
presumably responsible for the additional loop 404. The visual pattern can be compared with the 

15 visual pattern of a health subject and manually recognized as probative of indicating an abnormal 
condition such as atrial fibrillation. 

Instead of the 3 QRS complex components as shown in FIGURE 4B, other components or 
more than 3 components can also be used to plot the chaos phase space portrait. If more than 3 
components are used, the different components can be plotted in different colors. The 3 QRS 

20 complex components of FIGURE 4B are selected because test results suggest that such a phase 
space portrait is physiological significant and functions usually well as an indication of a patient's 
heart condition. 

Although FIGURES 3A, 3B, 4A and 4B were produced using test results related to the 
detection and localization of focal atrial fibrillation, the disclosed systems and methods can be used 

25 to detect and to localize other heart conditions including focal and re-entrant arrhythmia. The 
disclosed systems and methods can also be used to detect and to localize paroxysmal atrial 
fibrillation as well as persistent and chronic atrial fibrillation. 

The disclosed methods can be used to improve existing cardioverter/defibrillators (ICD's) 
that can deliver electrical stimuli to the heart. In addition to existing ICD's and existing 

30 pacemakers, some of the existing cardiac rhythm management devices also combine the functions 
of pacemakers and ICD's. A computing module embodying the disclosed methods can be added to 
the existing systems to separate the recorded cardiac signals. The separated signals are then used 
by the cardiac rhythm management systems to detect or to predict abnormal conditions. Upon 
detection or prediction, the cardiac rhythm management system automatically treats the patient, for 

35 example by delivering pharmacologic agents, pacing the heart in a particular mode, delivering 
cardioversion/defibrillation shocks to the heart, or neural stimulation of the sympathetic or 
parasympathetic branches of the autonomic nervous system. Instead of or in addition to automatic 

10 
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treatment, the system can also issue a warning to a physician, a nurse or the patient. The warning 
can be issued in the form of an audio signal, a radio signal, and so forth. The disclosed signal 
separation methods can be used in cardiac rhythm management systems in hospitals, in patient's 
homes or nursing homes, or in ambulances. The cardiac rhythm management systems include 
5 implantable cardioverter defibrillators, pacemakers, biventricular or other multi-site coordination 
devices and other systems for diagnostic EKG processing and analysis. The cardiac rhythm 
management systems also include automatic external defibrillators and other external monitors, 
programmers and recorders. 

In one embodiment, an improved cardiac rhythm management system includes a storage 

10 module that stores the separated signals. In one arrangement, the storage module can be removed 
from the cardiac rhythm management system and connected to a computing device. In another 
arrangement, the storage module is directly connected to a computing device without being 
removed from the cardiac rhythm management system. The computing device can provide further 
analysis of the separated signals, for example displaying a chaos phase space portrait using some of 

15 the separated signals. The computing device can also store the separated signals to provide a 
history of the patient's cardiac signals. 

The disclosed methods can also be applied to predict the occurrence of arrhythmia within a 
patient's heart. After separating recorded EKG signals into separated signals, the separated signals 
can be matched with stored triggers and diagnosis as described above. If the separated signals 

20 match stored triggers that are associated with arrhythmia, an occurrence of arrhythmia is predicted. 
In other embodiments, an arrhythmia probability is then calculated, for example based on how 
closely the separated signals match the stored triggers, based on records of how frequently in the 
past has the patient's separated signals matched the stored triggers, and/or based on how frequently 
in the past the patient has actually suffered arrhythmia. The calculated probability can then be used 

25 to predict when will the next arrhythmia occur for the patient. Based on statistics and clinical data, 
calculated probabilities can be associated with specified time periods within an arrhythmia will 
occur. 

In addition to EKG signals, the disclosed systems and methods can be applied to separate 
other electrical signals such as electroencephalogram signals, electromyographic signals, 

30 electrodermographic signals, and electroneurographic signals. They can be applied to separate 
other types of signals, such as sonic signals, optic signals, pressure signals, magnetic signals and 
chemical signals. The disclosed systems and methods can be applied to separate signals from 
internal sources, for example within a cardiac chamber, within a blood vessel, and so forth. The 
disclosed systems and methods can be applied to separate signals from external sources such as the 

35 skin surface or away from the body. They can also be applied to record and to separate signals 
from animal subjects. 

11 
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Although the foregoing has described certain preferred embodiments, other embodiments 
will be apparent to those of ordinary skill in the art from the disclosure herein. Additionally, other 
combinations, omissions, substitutions and modifications will be apparent to the skilled artisan in 
view of the disclosure herein. Accordingly, the present invention is not to be limited by the 
5 preferred embodiments, but is to be defined by reference to the following claims. 

The present application incorporates by reference U.S. Patent No. 5,706,402, titled "Blind 
signal processing system employing information maximization to recover unknown signals through 
unsupervised minimization of output redundancy" filed November 28, 1994 in its entirety as an 
APPENDIX as follows. 
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United States Patent No. 5,706,402 
Inventor: Anthony J. Bel] 

Blind signal processing system employing information maximization to recover 
unknown signals through unsupervised minimization of output redundancy 
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ABSTRACT 



A neural network system and unsupervised learning process 
for separating unknown source signals from their received 
mixtures by solving the Independent Components Analysis 
(ICA) problem. The unsupervised learning procedure solves 
the general blind signal processing problem by maximizing 
joint output entropy through gradient ascent to minimise 
mutual information in the outputs. The neural network 
system can separate a multiplicity of unknown source sig- 
nals from measured mixture signals where the mixture 
characteristics and the original source signals arc both 
unknown. The system can be easily adapted to solve the 
related blind deconvolution problem that extracts an 
unknown source signal from the output of an unknown 
reverberating channel. 
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VUXuSSnTO K^XnOWN STcanccllation. seismic devolution and image re, 
^^^ TOROUG^SDF^VISED toration. For instance, high-speed data transmission over a 

REFERENCE TO GOVERNMENT RIGHTS training mode that transmits a known training sequence to 

establish deconvolutioo parameters or in a blind mode. 

The U. S. Government has rights in meinvention dis- ^ ^ rf oommuaicBtioQ 5VStC ms that may need blind 

dosed and claimed herein pursuant to Office of Naval {Q equalixalion capab iiity bdudes high-capadry line-of-site 

Research grant no. NOOOH-93- 1-0631. digital radio (cellular telecommunications). Such a channel 

- — tv« rrrvrrT^xT suffers from anomalous propagation conditions arising from 

BACKGROUND OF THE INVENTION ™ wmch^Tdegrade digital radio perfor- 

1 Held of the Invention mance by causing the transmitted signal to propagate along 

Ttds invention relates generally to systena for recovering » several paths of different electrical length (mulopath 

J5S TilXowodSaU subjected to transfer through fading). Severe multipath fading requfres a blind equabza- 

an unlmown multichannel system by processing the known tion scheme to recover channel operation, 

output signals therefrom and relates specifically to an in reflection seismology, a rcflcctiott coeffiaent sequence 

info^tion-maximiring neural network that uses unsuper- ^ be blindly extracted from the received signal, which 

vised learning to recover each of a multiplicity of unknown 20 lncludfiS echoes produced at the different reflection points of 

source signals in a multichannel having reverberation. the unknown geophysical modeL The traditional Unear- 

o n>,^, B «f th e Related Art predictive seismic deconvolution method used to remove the 

2. Description of the Related Ait ^awaveform from a seismogram ignores valuable phase 

Blind Signal Processing: In many signal proving g^SSSK mTTflection sdsmogran, This 

ar^iications, the sample signals providcdby die sensors are „ ™a»an deconvolution to 

mixtures of many unknown sources. The ^paration of J™^"^^^ by fuming only a gonad 

sources" problem is to extract the original ^ow^ab f^^^^J^J^ 

from these known mixtures. Generally, the signal sources as swuauua * 

w^asX^XuTcharacteriitics ire unknown. Without Blkd deconvolution can also be used to^ver unknown 

bowledge of the signal sources other than the general „ images that are blurred by transmission through unknown 

statistical assumption of source Independence, this signal systems. 

processing problem is known to the art as the "blind source Blind Separation Methods: Because of the fundamental 
separation problem". The separation is "blind" because importance of both the blind separation and blind deconvo- 
nothing Is known about the statistics of the independent i utio n signal processing problems, practitioners have pro- 
source signals and nothing is known about the mixing posed several classes of methods for solving the problems. 
_ occss * The blind separation problem was first addressed to 1986 by 
The blind separation problem is encountered in many Jutten and Heraul. (-Blind separation of sources F^Ar 
familiar forms. Far instance, the well-known "cocktail adaptive algorithm phased on 

nSTmoblem refers to a situation where the unknown Signal pnetuing 24 (1991) 1-10). who disclose the HJ 

ClfstonaU bounds generated to a room and the m neural network with backward connections that can usuaUy 

kZ^ fsS stonaHre tLHutputs of several micro- solve the simple two-element blind source separation prob- 

SoTs S3 STwi? ^yed and attenuated ten. Disadvantageous!,. Ac HI network imnmons may no. 

ta ^te Tv«X>m«U during aansmiasion from converge to a proper solution to some cases, depending on 

tZTtc ^mteSTwH it is then mixed with other the initial state and on the source statistics. When convex- 

^d^d^^wed a^aSnuated source signals, indud- „ gence is possible, the HJ network appears to converge in two 

tatSCfetSS ^^(reverberatton), which are " fmges. dm first of which ^JESSES 

K^rions ^fmm^d^ns. Im^^STES 

tweaking voices readies one of rwo micro- * ^^^SEZ^'ZSZu 
phones. Other examples involving many mure* and many W Wgheradex cumulaats in 

^oy .^ array, the parsing of the environment fal U.d^pcodcnce * ndnsrotang higher-order sUhatics 

into separate objects by our biological visual system, and the „ among the known sensor Signals. 

separation of biomagnetic sources by a superconducting Other practitioners have attempted to improve the 10 

ouantum interference device (SQUID) array to magnetoen- network to remove some of the disadvantageous features. 

^KraS Cxher import example, of the Mind For instance Somuchyri (™»<^"<™ ™ 

Xe fetation problem include sonar array signal pro- ffl; Stability analysis" Sig~l /Wsstog 24 (1991 ) 21-29) 

cesstoe and signal decoding in cellular telecommunication to examines other higher-order non-linear transfomung tunc- 

rtam tions other than those simple first and third order functions 
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that employs linear beamf arming to improve HJ network 
separation performance. Also, John C Piatt et al. 
("Networks For The Separation of Sources That Are Super- 
imposed and Delayed**, Advances in Neural Information 
Processing Systems, vol. 4. Margan-Ksufojann, San Mateo, 
1992) propose extending the original mag rutude^ optimizing 
HJ network to estimate a matrix of time delays in addition 
to the HJ magnitude mixing matrix. Piatt et al. observe that 
their modified network is disadvantaged by multiple stable 
states and unpredictable convergence. 

Pierre Comon ("Independent component analysis, a new 
conceptr Signal Processing 36 (1994) 287-314) provides a 
detailed discussion of Independent Component Analysis 
(ICA). which defines a class of closed form techniques 
useful for solving the blind Identification and deconvolution 
problems. As Is known in the art, ICA searches for a 
transformation matrix to minimize the statistical dependence 
among components of a random vector. This is distinguished 
from Principal Components Analysis (PCA). which searches 
for a transformation matrix to minimize statistical correla- 
tion among components of a random vector, a solution that 
is inadequate for the blind separation problem. Thus, PCA 
can be applied to minimize second order cross-moments 
among a vector of sensor signals while ICA can be applied 
to minimize sensor signal joint probabilities, which offers a 
solution to the blind separation problem Comon suggests 
that although mutual information is an excellent measure of 
the contrast between joint probabilities, it is not practical 
because of computational complexity. Instead, Comon 
teaches the use of the fourth-order cumulant tensor (thereby 
ignoring fifth-order and higher statistics) as a preferred 
measure of contrast because the associ at ed computational 
complexity increases only as the fifth power of the number 
of unknown signals. 

Similarly. Gilles Burel ("Blind separation of sources: A 
nonlinear neural algorithm". Neural Networks 5 (1992) 
937-947) asserts that the blind source separation problem is 
nothing more than the Independent Components Analysis 
(ICA) problem. However, Burel proposes an iterative 
scheme for ICA employing a back propagation neural net- 
work for blind source separation that handles non-linear 
mixtures through iterative minimization of a cost function. 
Burel' 8 network differs from the HJ network, which does not 
minimize any cost function, like the HJ network, Burel 1 s 
system can separate the source signals in the presence of 
noise without attempting noise reduction (no noise hypoth- 
eses are assumed). Also, like the HJ system, practical 
convergence is not guaranteed because of the presence of 
local minima and computational complexity. Burel' s system 
differs sharply from traditional supervised back-propagation 
applications because bis cost function is not defined in terms 
of difference between measured and desired outputs (the 
desired outputs arc unknown). His cost function is instead 
based on output signal statistics alone, which permits "unsu- 
pervised" learning in his network. 

Blind Deconvolution Methods: The blind deconvolution 
art can be appreciated with reference to the text edited by 
Simon Haykin (Blind Deconvolution. Prentice-Hall, New 
Jersey, 1994). which discusses four general classes of blind 
deconvolution techniques, including Bussgang processes, 
higher-order cumulant equalization, polyspectra and maxi- 
mum likelihood sequence estimation. Haykin neither con- 
siders nor suggests specific neural network techniques suit- 
able for application to the blind deconvolution problem. 

Blind deconvolution is an example of '"unsupervised" 
learning in the sense that it learns to identify the inverse of 
an unknown linear time-invariant system without any physi- 
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cal access to the system input signal This unknown system 
may be a conminimum phase system having one or more 
zeroes outside the unit circle in the frequency domain. The 
blind deconvolution process must identify both the magni- 

5 tude and the phase of the system transfer function. Although 
identification of the magnitude component requires only the 
second-order statistics of the system output signal, identifi- 
cation of the phase component is more difficult because it 
requires the higher-order statistics of the output signal 

10 Accordingly, some form of non-linearity is needed to extract 
the higher-order statistical information contained in die 
magnitude and phase components of the output signal. Such 
non-linearity is useful only for unknown source signals 
having non-Gaussian statistics. There is no solution to the 

l5 problem when the input source signal is Gaussian- 
distributed and the fH»niw] i$ nonminimum-phase because 
all polyspectra of Gaussian processes of order greater than 
two are identical to zero. 
Classical adaptive deconvolution methods are based 

20 almost entirely on second order statistics, and thus fail to 
operate correctly for nonminiraiim-phase channels unless 
the Input source signal is accessible. This failure stems from 
the inability of second-order statistics to distinguish 
minimum-phase information from maximum-phase infer- 

u mation of the ohanneL A minimum phase system (having all 
zeroes within the unit circle in the frequency domain) 
exhibits a unique relationship between its amplitude 
response and phase response so that second order statistics 
in the output signal are sufficient to recover both amplitude 

w and phase information for the input signal. In a 
nonminimum-phasc system, second-order statistics of the 
output signal alone are insufficient to recover phase infor- 
mation and. because the system does not exhibit a unique 
relationship between its amplitude response and phase 

|5 response, blind recovery of source signal phase information 
is not possible without exploiting higher-order output signal 
statistics. These require some form of non-linear processing 
because linear processing is restricted to the extraction of 
second-order statistics. 

40 Bussgang techniques for blind deconvolution can be 
viewed as iterative poly spectral techniques, where rationale 
arc developed for choosing the poly spectral orders with 
which to work and their relative weights by subtracting a 
source signal estimate from the sensor signal output. The 

45 Bussgang techniques can be understood with reference to 
Sandro Bellini (chapter 2: Bussgang Techniques For Blind 
Deconvolution and Equalization", Blind Deconvolution, S. 
Haykin (ed.). Prentice Hall, Englewood Cliffs. NJ., 1994). 
who characterizes the Bussgang process as a class of pro- 

50 cesses having an auto-correlation function equal to the 
cross-correlation of the process with itself as it exits from a 
zero-memory non-linearity. 

Pdyspectral techniques for blind deconvolution lead to 
unbiased estimates of the channel phase without any infor- 

55 mation about the probability distribution of the input source 
signals. The general class of poly spectral solutions to the 
blind decorr elation problem can be understood with refer- 
ence to a second Simon Haykin textbook ("Ch. 20: Blind 
Deconvolution**, Adaptive Filter Theory, Second Ea\ % Simon 

» Haykin (ed.). Prentice Hall, Englewood Cliffs, NJ., 1991) 
and to Hatzinakos et al ("Gb. 5: Blind Equalization Based 
on Higher Order Statistics (HOS) H . Blind Deconvolution. 
Simon Haykin (ed.). Prentice Hall. Englewood Cliffs, NJ., 
1994). 

65 Thus, the approaches in the art to the blind separation and 
deconvolution problems can be classified as those using 
non-linear transforming functions to spin off higher-order 
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statistics (Juttco etal. and Bellini) and those using explicit examines this issue and shows that "minimum entropy 

cSSon hlgher-order cumnlant, aad poly^a coding- in a biological ^Zll^M ^tZ 

(Haykin and HatzSakos « aL). The HJ network does not the troublesome mutual information <^?"*^""* e 

IcUabW converge even for the simplest two-source problem expense of suboptimal symbol frequency distribution. Bar- 

^Z^cLula^tL approach doe, not , low. hows Ihatthe ^«*-^«3£2£?2£ 

reUably converge because of truncation of the cumnlant dancy can be minimized m a neural netwtak by feeduig each 

oZsion There is accordingly a deaiiy-felt need for blind neuron output back to other neuron inputs ttoughanti. 

methods that can reliably solve the blind Hebbian synapse, to discourage «««f^« 

Bp<>bL for significant numbers of source sig- This "redundancy reducfot," principle goffered to explain 

£j7 * how unsupervised perceptual learning occur, in animals. 

Unsupervised laming Methods: In the biological sen- S. UughUn ("A Simple Coding Procedure JBata.« - 

sory system arts, practitioner* have formulated neural train- Neuron's Information Capaay" Z Haturforzch 3 J£ Jtt 

£7opSty critenabasedon studies of biological sensory 910-912) prove, that the optical neuron of a blowfly opri- 

n^ronTTmdi are known to solve blind separation and mizos Information capacity through ^<^°° 

d^^n problems of many kinds. The cUs, of super- l5 probability distribution for each ™*co*^ value 

^learning techniques normally used with artificial neu- (minimizing the unused channel capacity ' 

^1 „e3S are no? useful for these problems because redundancy), thereby ,^1^!' 

^pe^ed learning require, access to the source .ignal. for redundancy; principle. 1. 1. Hopfield K °^g^Ti 

trSnTpurposes Unwptrvlsed learning instead requires tion and object perception-. Proc. Atari. Acad. Set VSAJtS 

^rffleX interrupting me necessary teaching M (August 1991) 6462^66) examines the 

ZLTwithout access to the .ource signals. source rolution in neurons using the HJ neuron model for 

Prlrf ,-rinna-. have trooosed several rationale for uiuuper- minimizing output redundancy. 

JSSS^S^P^i^ L JZ, Becker et aL ("Self^ganidng neural <«work*al di, 

SoTAn AR.uc.tioo of the Principle of Maximum covers surface, in random-dot stereograms V ° L 

ELton Nation to Linear Sy^.". Advances in M 355. pp. 161-163. Jan. 9. l J^*™^<^* 

AW Information Froccsing System. 1. D. S. Touretzky prop^ation neural tjeWork " 

MX Momn-KMfoann. (1989) shows thatnis well-known replace the external teacher (supervBcrf leanu^) by 

^omax^prindple (first proposed in 1987) explains why internally-derived teaching dgnalsjunsupavued learning 

b£S ^scTsystetn. Sperateto minimize intamation Becker et aL use non-linear netwerts to maximize mutuiU 

lo« bSn^Ser, iTn^presence of noise. In a lata M infonnation between different set. of output, *° * 

w«k7S Synaptic Learning Rule, Suffice to Maximize blind signal recovery requirement. By »««c«s.ng 

M^malim^floi in a Linear Network". Neural Compu- redundancy, their network discovers mvanance in separate 

ZSmfSm 691-702) Linsker describes a two-pease group, of inputs, which can be sdeeW ho* oftaform^on 

learnlne alwritom for maximizing the mutual information passed forward to improve processing efficiency, 

between twoUyer. of a neural network. However. Unsker 33 Thus, it is known in the neural network arts that ana- 

assumes a linear input-output transforming function and Hebbian mutual Interaction can be used to explain the 

multivariate Gaussian statistics for both source signals and decorrelation or minimization of redundancy observed in 

noise components. With these assumptions. Linsker shows biological virion systems. This can be apprec^tedwiU) 

that a loolsynaptic- (biological) learning rule i, sufficient reference to H. B. Bariow et al. ("Adaptation andDecorre- 

U>maximizcmum.dMormationbutbendtherconsidennor «, lation in iht Conex". The Cl ^^ s N "JT^^^:^ 

suggests solutions to the mote general blind processing aL (cds.). Addisoo-Wesley. (1989) and to Sctanidolph et aL 

proUem of recovering nonOaussian .ource signals in a ("Competitive Antl-Hebbian Learning of Invariant 

Sonlnear uniforming environment Advances in Neural Information Pr^ung System 4. LE 

Staon Havkin CQ 11: Self-Organizing System, ID: Moody et al. (eds.). Morgan-Kaufmann 1992). In fact. 

tSSSaSS^J^, SL SeLL A Con, as pracridoners ^e Lm^' Wot^ pnn- 

orehenshe Foundation. S. Haykin (ed.) MacMillan, New ciple and Bariow'. "minimum r«tandancy P^nptemay 

Y6A1994) d^^.ker'. -'infomax- principle, which both yield the same neural network learning procedures. 

U^l^aTmen^mcritle^ruteu^dinits Until now. however.«,»-linear versions 

ScS, Haykin also discus.es other well-known applicable to the blind signal processing problem have been 

trindnles such as the "minimization of infonnation loss" « unknown in the art. . . 

Principle suggested in 1988 by Plumbley et aL and Barlow's The Blind Processing Problem; As mentioned above. 

^tiJplt ofmioimum redundancy-, first proposed in 1961. blind source separation and blind devolution are rotated 

riZTofwmchcanbeuse4toderiveacJassofun«iper^ problems in signal proce.sing/Ihe blind source separauon 

tearaLn* rules problem can be succinctly stated as where a set of unknown 

Joseph Atick ("Could information theory provide an eco- Si source signals S^t) S/t), are inlxed togethn kneariy 

loi^^seT^pSiingr^Lrz 3 (1992) by an unknown matrix fA£ Nothing ; it i known about tee 

213^51 TipUcs Shannon's information theory to the .ource. or the mixing pocess. both of which may be 

...^^S^eTtol biolorical optical tensort Adck time-varying, although the mixing process is assumed to 

^X^^SA^Lo^isc vary slowly with re.pect to the — J^STCS 

and include, two component,: (a) unused channel capacity 60 task is to recover die original sou^ rignei. from Jhe *H 

SstoE from , U oopumal.ymbo frequency distribution and measured superposiuon. of them. X/t) . . .X/t)by finding 

ft) ^SSmSSSmx^Z mutual formation. Atick a square matrix [WJ that is a pennutauon of ttte inv«* of 

Sgt^Km^Wn. ^enOy evolved to mini- the unknown matrix^ ^ 

^ toe troublesome mtersymbol redundancy (mutual can be similarly stated as where a suigle unknown ignri S(0 

Xi-^nTc7nJon^t of redundancy rather than to mini- « is convolved with « untoown tapped «qM-e 1U« 

BtoToverall redLndancy. H. B. Barlow ("Unsupervised A, , A„ producing the comipted measured signal 

S^lTo^wta 1 (1989) 295-311) also X(t)=A(t) • S(f). where Aft) is me impulse response of the 
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unknown (perhaps slowly tiroc-varying) filter. The Wind 
deconvolutioo task is to recover S(t) by finding and con- 
volving X(t) with t tapped delay-line filter . . . , W y 
having the Impulse response W(t) that reverses the effect of 
the unknown filter A(t). 

There are many similarities between the two problems. In 
one. source signals are corrupted by the superposition of 
other source signals and* in the other, a single source signal 
is corrupted by superposition of time-delayed versions of 
f itself. In both cases, unsupervised learning is required 
because no error signals are available and no training signals 
are provided. In both cases, second-order statistics alone are 
inadequate to solve the more general problem. For instance, 
a second-order decorrelation technique such as that pro- 
posed by Barlow et al. would find un correlated (linearly 
independent) projections [Y,] of the Input sensor signals [X,] 
when attempting to separate unknown source signals {S t } 
but is limited to discovering a symmetric decorrelation 
matrix that cannot reverse the effects of mixing matrix [A^J 
if the mixing matrix is asymmetric. Similarly, second-order 
decorrelation techniques based on the autocorrelation 
function, such as prediction-error filters, are phase-blind and 
do not offer sufficient information to estimate the phase 
characteristics of the corrupting filter A(t) when applied 10 
the more general blind deconvolution problem. 

Thus, both blind signal processing problems require the 
use of higher-order statistics as well as certain assumptions 
regarding source signal statistics. For the blind separation 
probl em, the sources are assumed to be statistically inde- 
pendent and non-Gaussian. With this assumption, the prob- 
lem of learning [W U J becomes the ICA problem described by 
Comon. For blind deconvolution. the original signal S(t) is 
assumed to be a "white" process consisting of Independent 
symbols. The blind deconvolution problem then becomes 
the problem of removing from the measured signal X(t) any 
statistical dependencies across time that are introduced by 
the corrupting filter A(t). This process is sometimes denomi- 
nated the "whitening" of X(t). 

As used herein, both the ICA procedure and the "whit- 
ening"* of a time series are denominated 'redundancy reduc- 
tion". The first class of techniques uses some type of explicit 
estimation of cumulants and polyspectra, which can be 
appreciated with reference to Hayltin and Hatzinakos et al 
Disadvantageously, such "brute force" techniques are com- 
putationally intensive for high numbers of sources or taps 
and may be inaccurate when cumulants higher than fourth 
order are ignored; as they usually must be. The second class 
of techniques uses static non-linear functions, the Taylor 
scries expansions of which yield higher-order terms. Itera- 
tive learning rules containing such terms are expected to be 
somehow sensitive to the particular higher-order statistics 
necessary to accurate redundancy reduction. This reasoning 
Is used by Comon et al. to explain the HI oetwoik and by 
Bellini to explain the Bussgang deconvolver. 
Dls advantageously, there is no assurance that the particular 
higher-order statistics yielded by the (heurisucally) selected 
non-linear function are weighted In me manner necessary for 
achieving statistical independence. Recall that the known 
approach to attempting improvement of the HI network is to 
test various non-linear functions selected heuristically and 
that the original functions are not yet Improved in the art 

Accordingly, there is 8 need in the art for an improved 
blind processing method, such as some method of rigorously 
Unking a static non-linearity to a learning rule that performs 
gradient ascent in some parameter guaranteed to be usefully 
related to statistical dependency. Until now. this was 
believed to be practically impossible because of the infinite 



8 

number of higher-order statistics associated with statistical 
dependency. The related unresolved problems and deficien- 
cies are clearly felt in the an and are solved by mis invention 
in the manner described below. 

3 SUMMARY OF THE INVENTION 

This invention solves the above problem by introducing a 
new class of unsupervised learning procedures for a neural 
network that solve the general blind signal processing prob- 

10 lem by maximizing Joint input/output entropy through gra- 
dient ascent to nunimlze mutual information in the outputs. 
The network of this invention arises from Che unexpectedly 
advantageous observation that a particular type of non-linear 
signal transform creates learning signals with the higher- 

13 order statistics needed to separate unknown source signals 
by minimizing mutual information among neural network 
output signals. This invention also arises from the second 
unexpectedly advantageous discovery that mutual informa- 
tion among neural network outputs can be tninimirrd by 

*° m aT itni»in g joint output entropy when the learning trans- 
form is selected to match the signal probability distributions 
of interest 

The process of this invention can be appreciated as a 
generalization of the infomax principle to non-linear units 

25 with arbitrarily distributed inputs unccrrupted by any known 
noise sources. It is a feature of the system of mis invention 
that each measured input signal is passed through a prede- 
termined sigmoid function to adoptively maximize Informa- 
tion transfer by optimal alignment of the monotonic sigmoid 

30 slope with the Input signal peak probability density. It is an 
advantage of this invention that redundancy is minimized 
among a multiplicity of outputs merely by maximizing total 
information throughput thereby producing the independent 
components needed to solve the blind separation problem. 

35 The foregoing, together with other objects, features and 
advantages of mis invention, can be better appreciated with 
reference to the following specification, claims and the 
accompanying drawing. 

40 BRIEF DESCRIPTION OF THE DRAWING 

For a more complete understanding of this invention, 
reference is now made to the following detailed description 
of the embodiments as illustrated in the accompanying 
43 drawing, wherein: 

FIGS. 1A, IB, 1C and ID illustrate the feature of sig- 
moidal transfer function alignment for optimal information 
flow in a sigmoids! neuron from the prior art; 

FIGS. 2A, 2B and 2C illustrate the blind source separation 
so afid blind deconvolution problems from the prior art; 

FIGS. 3A. 3B and 3C provide graphical diagrams illus- 
trating a joint entropy maximization example where maxi- 
mizing joint entropy fails to produce statistically indepen- 
dent output signals because of improper selection of the 
55 non-linear transforming function: 

FIG. 4 shows the theoretical relationship between the 
several entropies and mutual information from the prior art; 
FIG. 5 shows a functional block diagram of an illustrative 
^ embodiment of the source separation network of this inven- 
tion; 

FIG. 6 is a functional block diagram of an illustrative 
embodiment of the blind decorrelating network of this 
invention; 

65 FIG. 7 is a functional block diagram of an illustrative 
ernboo^ment of the combined blind source separation and 
blind decorrelation network of this invention; 
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FIGS. fiA, 8B and 8C show typical probabiliiy density Referring to FIG. 1A. when a single input x is passed 

functions for speech, rock music and Gaussian white noise; through a transfenning function g(x) to give an output 

win* oa »nH «r «h™ tvoical sncctra of a sneech signal variable y, both I(yjc) and H(y) arc rnaxirnized when the 

beSnd^ high density portion (mode) of the ^tprobab^ 

™e^e cfL iSon; 5 function «x> is aligned with the »t«f^^ing portion of 

^ , . , non-linear tranrfccToing function g(x). This is equivalent to 

FIG. 10 shows the results of a blind source icparauon ^ ^ t of , input^utput function to the 

cxpcriinent performed using me p^ expected distribution of incoming signals that leads to 

ft*** optimal information flow in sigmoidal neurons shown in 

FIGS. 11A, 11B, 11C. 11D, 11E. 11F, 11G, UH. 11L 111. 10 pj GS lC _ 1D . nG> ij) sh ows a zero-mode distribution 

11K and 11L show time domain filter charts illustrating the matched to the sigmoid function in FIG. 1C. In FIG. 1A, the 

results of the blind deconvolution of several different cor- ^put x having a probability distribution f^x) is passed 

nipted human speech signals according to the procedure of through the non-linear sigmoidal function g(x) to produce 

this invention. output signal y having a probability distribution Uy). The 

rvPTATT im nP^imnN OF THE 15 i^ 0 ™** 00 «** probabiUty density function tjy) varies 

S^SZrf responsive to me alignment of the mean and variance of x 

PREFERRED EMBODIMENTS ^ tQ ^ ^ and slope w rf g ( x ). When 

This invention arises from the unexpectedly advantageous g( x ) is moootonicaUy increasing or decreasing (thereby 

observation that a class of unsupervised learning rules for having a unique inverse), the output signal probability 

maximizing information transfer in a neural network solves 20 density function Uy) can be written as a function of the 

the blind signal processing problem by minimizing redun- input signal probability density function fjx) as follows: 
dancy in the network outputs. This class of new learning 

rules is now described in information theoretic terms, first ^ AC*) ^ 31 

for a single input and then for a muluplidty of unknown 

input signals. 25 
Information Maximization For a Single Source 

In a single-input network, the mutual information that the where 1-1 denotes absolute value, 

output y of a network contains about its input x can be Eon. 3 leads to the unexpected discovery of an advanta- 

exprcsse4 as: geous gradient descent process because the output signal 

3Q entropy can be expressed in terms of the output signal 

I(**)*myyH(yi*) IBqiL l) probability density function as follows: 

where H(y) is the entropy of the output signal, H(ytx) is that 

portion of the output signal entropy that did not come from r -h» 

the input signal and I(y,x) is the mutual information. Eqn. 1 H(y) * -£1*4001 —J ^U^MUfA 

can be armrcdated with reference to FIG. 4, which illustrates 35 

the well-known relationship between input signal entropy where E[ ) denotes expected value. Substituting Eqn. 3 into 

H(x), output signal entropy H(y) and mutual information Eqn. 4 produces the following: 

I(yx). 

When there is no noise or when the noise is treated as r 1 n ^ 5 ) 

merely another unknown input signal, the mapping between 40 ^ J L I I J 

infinity. This divergence is a consequence of me gencrah- "J 1 ' m * £ £ ™ w ^ defines 

ration of information theory to continuous random vari- anectea oy any cnangwui . 

m%mm w ' iM;«w«ti fl p non-linear function g(x). Therefore, only the first term on the 

ables. The output entropy H(y) is really the dufoential 45 ™ maximized to niaximize the 

entropy of output signal y with respect to some reference, n « m sia f °\ 3 n ^,^T~rTT" ■ tlw . , vrrfl _ 
«*2 the ncL l£l or the gSity of the discrete 

representation of *e variables ^y.^e^ « 

complexities can be avoided by restricting the network to the r with density fix) and deriving an online, 

consideration of the gradient of information theoretic quan- 50 a * K \ t «™ ^^J^l_. ^^LT ™ 

Seiwitn respect toTome parameter w. Such gradients are stochastic gradient descent learning rule expressed as. 

as welt-bchavcd as are discrete-variable entropies because ^ 6J 

the reference terms involved In the definition of differential w a / 1 \\ ( Jy \ 3 I \ 

entropies disappear: In particular. Eqn. 1 can be differenti- Aw "^T B ~3^" { to | \r V" 3 * ) V~ 3 W 

ated to obtain the corresponding gradients as follows: 53 , . - 

1U B Eq n . e defines a scaling measure Aw for changing the 

d 3 (Eqa. 2! parameter w to adjust the log of the slope of sigmoid 

-£T function. Any sigmoid function can be used to specify 

„, , . . „ . » measure Aw, such as the widely-used logistic transfer func- 

becausc, in the noiseless case, H(ytx) docs not depend on w "~ 1 ' 

and its differential disappears. Thus, for continuous deter- « Qon * 

mini stic matchings, the mutual infonnation between net- ^j^,— )-\ w h«c «r^« 0 [Eqn. 7] 
work input and network output can be maximized by maxi- 
mizing the gradient of the eutropy of the output alone, which in which the input x is first aligned with the sigmoid function 
is an unexpectedly advantageous consequence of treating through multiplication by a scaling weight w and addition of 
noise as another unknown source signal This permits the 65 a bias weight w 0 to create an aligned signal u. which is then 
discussion to continue without knowledge of the input signal non-linearly transformed by the logistic transfer function to 
^ticHnc create signal y. Another useful sigmoid function is the 
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hyperbolic tangent function expressed as y=ianh(u). The If the hyperbolic tangent sigmoid function is used, the 

hyperbolic tangent function is a member of the general class bias measure Aw 0 then becomes proportional to -2y and the 

of functions g(x) each representing a solution to the partial scaling measure Aw becomes proportional to -2xy+w" > 

differential equation. such that Aw<p-2ye and Aw=*(-2xy+w~ l ), where e is the 

5 l earning rate. These learning rules offer the same general 

3 [Eqa. 8] features and advantages of the learning rules discussed 
ST $(*)-! above in connection with Eons. 10-11 for the logistic 

with a boundary condition of g(0H>.The parameterr should transfer function, to .general any sigmoid faction In , fee 

^selected a^ropriately fJ thTassumed bmosis of the class of solutions to Eqn. 8 ^^J^^ c ^^^ 

^Tprohab^tribLn. For fautosis above 3. either » to a l^cular input probability ~JtS£ 

mrnwerboUc tangent function (r-2) or the non-member accordance with the process of this invention to solve the 

K^e^fun^lYs well suited for the process of this blind signal processing F oblem Tnese ^pcac^y 

ioguucuausiaiuDwuuu i- tageous learning rules can be generalized to the multi- 

invention. ..** ° 

For the logistic transfer function (Eon. 7), the terms In dimensional case. 

* ™ S^™^ 15 Joint Entropy Maxiimzation for Multiple Sources 

Eqn, 6 can be expressed as. To appreciate the multiple-signal blind processing method 

, [Bqn. 9] of this invention, consider the general network diagram 

-g- =wy(\-j) shown in FIG. 2A where the measured input signal vector 

[X] is transformed by way of me weight matrix [W] to 

j / 3r ^ => d _ y yi + mu(i - 2y» 20 produce a moootonically transformed output vector [Yj=g( 

V / [WHXHfWJ). By analogy to Bqn. 3, the multi ivariate 

Dividing Bqn. 10 by Eqn. 9 produces a scaling measure probability density function of [Y] can be expressed as 
Aw for the scaling weight learning rule of this invention 

based on the logistic function: ,„ MX) IB<P- «j 

25 ^ )= TF 

A»«l+fc>** "*) ll} where m is the absolute value of the Jacobian of the 

where oOisa learning rate. transformation that produces output vector [Y] from input 

Similar reasoning leads to a bias measure Aw 0 for the bias vector [X]. As is well-known in the art the Jacobian is the 

weight learning rale of this invention based on the logistic ^ determinant of the matrix of partial derivatives: 
transfer function, expressed as: 



-35T ' ' ' 



These two learning rules (Eqns. 11-12) are implemented ^ 

by adjusting the respective w or w 0 at a "learning rate" (e), 35 a 
which is usually less than one percent (e<0.01), as is known 
in the neural network arts. Referring to FIGS. 1A-1C. if the 
input probability density function f^x) is Gaussian, then the 

bias measure Aw 0 operates to align the steepest part of the where det[-] denotes the determinant of a square matrix, 

sigmoid curve g(x) with the peak a of fjx), thereby match- 40 By ^ogy to the single-input case discussed above, the 

ing input denary to output slope in the manner suggested method of this invention maximizes the natural log of the 

intuitively by Bqn. 3. The scaling measure Aw operates to jacobian to maximize output entropy H(Y) for a given input 

align the edges of the sigmoid curve slope to the particular entropy H(X). as can be appreciated with reference to Eqn. 

width (proportional to variance) of fj[x). Thus, narrow 5, The quantity InUI represents the volume of space in [Y] 

probability density functions lead to sharply-sloping sig- 45 ; oto in [X] are mapped. Maximizing this 

moid functions. quantity attempts to spread the training set of input points 

The scaling measure of Eqn. 1 1 defines an "anti-Hebbian" evenly [Y). 

learning rule with a second "anti-decay" term The first par the commonly-used logistic transfer function,, the 

anti-Hebbian term prevents the uninformative solutions resulting learning rules can be proven to be as follows: 

where output signal y saturates at 0 or 1 but such an 50 fAW w^Mm*'4fflW'tt [Bca is) 

unassisted anti-Hebbian rule alone allows the slope w to (aMIHTO +i{W) r> " J 

disappear at zero. The second anti-decay term (lAv) forces (AW e N<Uhiin) ^ l6 l 

output signal y away from the other uninformative situation fc l5 ^ tet ano.Hebtoian term has become an 

where slope w is so flat that output signal y stabilizes at 0i ouU=r product of vcctan ^ mc sccond anti-decay term has 

(FIG. 1A). 55 generaiited to an "anti-redundancy' 1 term in the form of the 

The effect of these two balanced effects is to produce an Qf ^ ^uispost of the weight matrix [WJ. Eqn. 15 

output probability density function f^y) that is close to the caQ ^ writtcn for BD individual weight as follows: 
fiat unit distribution function, which is known to be the 

maximum entropy distribution for a random variable . > r&p nj 

bounded between 0 and 1. FIG. IB shows a family of « ♦xX'-«i) ) 

siemoid output distributions, with the most informative one „ 

occurring at sigmoid slope Using me logistic transfer where coflW,) denotes the cofactor of element which 

function as the non-linear .igmrid transformation, the lean- is known to be (-1)'* times the determinant of themami 

tog rule to Eqn. U eventually brings the slope w to w^ obtained by removing the i* row and the j column from the 

thereby maximizing entropy to output signal y. The bias rule 65 square weight matrix [W] and e is the learning rate, 
to Bqn. 12 centers the mode to the sloping region at w 0 Similarly, the 1* bias measure AW„ can be expressed as 

(FIO. 1A). foUows: 
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The rules shown in Eqns. 17-18 arc the same as those for 
the single unit mapping (Eons. U-12) except that the 
instability occurs at det[W]-0 instead of w=0. Thus, any 
degenerate weight matrix leads to instability because any 
weight matrix having a zero determinant is degenerate. This 
fad enables different outputs Y, to learn to represent differ- 
ent things about the inputs X r When the weight vectors 
entering two different outputs become too similar, dctfW] 
becomes small and the natural learning process forces these 
approaching weight vectors apart This effect is mediated by 
the numerator coffWJ, which approaches zero to indicate 
degeneracy in the weight matrix of the rest of the layer not 
associated with input X y or output Y,. 

Other sigmoidal transformations yield other training rules 
that are similarly advantageous as discussed above in con- 
nection with Eqn. 8. For instance, the hyperboUc tangent 
function yields rules very similar to those of Eqns. 17-18. 



the Jacobian of the Eqn. 22 transformation according to Eqn. 
13. The ensemble can be "created" from a single time series 
by breaking the series into sequences of length t which 
reduces [W] in Eqn. 23 to an M lower triangular matrix. The 
Jacobian of the transformation is then written as follows: 



J a del 



r jm i 



j 



10 
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which may be decomposed into the determinant of the 
weight matrix [W] of Eqn. 23 and the product of the slopes 
of the sigmoidal squashing function for all times t Because 
I WI is lower-triangular, its teerminant is merely the product 
of the diagonal values, which is W/. As before, the output 
signal entropy H(Y) is rnaximucd by inaximizing the loga- 
rithm of the Jacobian, which may be written as: 



I "EOT 



[Eqa. 191 



The usefulness of these blind source separation network 
learning rules can be appreciated with reference to the 
discussion below in connection with HO. 5. 
Blind DeconvolutJon in a Causal Filter 

FIGS. 2B-2C illustrate the blind dcconvolution problem. 

FIG. 2C 'shows an unobserved data sequence S(t) entering an 

unknown channel A(t), which responsively produces the 

measured signal X(t) that can be blindly equalized through 

a causal filter W(t) to produce an output signal U(t) approxi 
... . • i . ^ -~ c/»\ xnn *>t 



If me hyperbolic tangent is selected as the nonlinear 
sigmoid function, then differentiation with respect to the 
filter weights W(t) provides the foUowing two simple learn- 
ing rules: 

[Eqo-26] 



25 a».-..^(-4t-«w) 



<>t 
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Bqns. 26-27, W, is the "leading weight" and 

W/i=£ I) represent the re mainin g weights in a delay 

line having I weighted taps linking the input signal sample 
X,^ to the output signal sample Y r The leading weight W, 
a causal filter W(t) to produce an oumui »gnai u w therefore adapts like a weight connected to a neuron with 

marine the original unobserved data sequence S(t). FIG. 2B 33 oaly that one input (Eqn. 11 above). The other tap weights 
shows the time series X(t), which is presumed to have a fw,} attempt to decoxrelate the past input from the present 
length of J samples (not shown). X(t) is convolved with a Thu*, the leading weight W, keeps me causal filter 

causal filter having 1 weighted taps. W r ... ,W, and impulse from "shrilling". 

response W(t). The causal filter output signal U(t) is then Other sigmoidal functions may be used to generate simi- 
oassed through a non-linear sigmoid function g(0 to create ^ larly useful learning rules, as discussed above in connection 
% -z i vm ch^n/n^ Thlt cvttem can be n A* Pon ft The equivalent rules for the logistic transler 



the training signal Y(t) (not shown). This system can be 
expressed either as a convolution (Eqn. 21) or as a matrix 
equation (Eqn. 22) as follows: 



with Eqn. 8. The equivalent rules for the logistic transfer 
function discussed above can be easily deduced to be: 

[Bqp, 2S] 



[Bp, 211 

iap.z2i 



45 



in which [Y)=g([U]) and [X] are signal sample vectors 
having J samples. Of course, the vector ordering need not be 
temporal. For causal Altering, [W] is a banded lower trian- 
gular JxJ square matrix expressed as: 

[Eqn. 23} 



4Wj 



50 



1*1- 



0 



55 



Assuming an ensemble of time series, the joint probability 
distribution functions f fJfl ([Yl) and f lx f\X)} are related by 



The usefulness of these causal filter learning rules can be 
appreciated with reference to the discussion below in con- 
nection with FIGS. 6 and 7. 
Information Maximization v. Statistical Dependence 

The process of this invention relies on the unexpectedly 
advantageous observation that, under certain conditions, the 
niAxiinizauoh of the mutual information I(YX) operates to 
minimize the mutual information between separate outputs 
{Ui) in a multiple source network, thereby performing the 
redundancy reduction required to solve the Wind signal 
processing problem. The usefulness of this relationship was 
60 unsuspected until now. When limited to the usual logistic 
transfer or hyperboUc tangent sigmoid functions, mis inven- 
tion appears to be limited to the general class of super- 
Gaussian signals having kurtosis greater than 3. This limi- 
tarion can be understood by considering the foUowing 
65 example shown in FIGS. 3A-3C 

Referring to FIG. 3A, consider a network with two 
outputs y t and Y a , which may be either two output channels 
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from a blind source separation network or two signal 
samples at different times for a blind decon volution network. 
The joint entropy of these two variables can be written as: 



where igDCh) m 



16 

-continued 
♦lfor »j>0 \ 

0 ft» = 0 J 

-lfarK,<0 



Thus, the joint entropy can be maximized by maximizing 
the individual entropies while minimi ring the mutual infor- 
mation I(y t ,y 3 ) shared befweea the two. When the mutual 
information ICy^y*) is zero, the two variables y , and y 2 are J0 
statistically independent and the joint probability density 
function is equal to the product of the individual probability 
density functions so that fw^(y l .y a M*(yi) f *(y J. Both the 
ICA and the 'Vhitcning approach to deconvolution are 
examples of pair- wise minimization of mutual information ]} 
I(yi«ya) for all pairs y, and y 2 . This process is variously 
denominated factorial code learning, predictability 
minimization, independent component analysis ICA and 
redundancy reduction. 

The process of this invention is a stochastic gradient M 
ascent procedure that maximizes the joint entropy H(y ^y^), 
thereby differing sharply from these 'Whitening" and ICA 
procedures known for minimizing mutual information I(y,. 
yj. The system of this invention rests on the unexpectedly 
advantageous discovery of the general conditions under ^ 
which maximizing joint entropy operates to reduce mutual 
information (redundancy), thereby reducing the statistical 
dependence of Che two outputs y l and Y 2 . 

Under many conditions, maximizing joint entropy H(y,. 
yj) does not guarantee minimization of mutual information x 
Ity^Yj) because of interference from the other single 
entropy terms HCy,) in Eqn. 30. FIG. 3C shows one patho- 
logical example where a "diagonal" projection of two 
independent, uniformly-di stributcd variables x, and x 2 is 
preferred over the "independent " projection shown in FIG. 3J 
3B when joint entropy is maximized This occurs because of 
a nv<™»»<* between the requisite alignment of input prob- 
ability distribution function and sigmoid slope discu s sed 
above in connection with FIGS. 1A-1C and Eqn. 8. The 
learning procedure of this invention achieves the higher ^ 
value of mutual entropy shown in FIG. 3C than the desired 
value shown In FIG. 3B because of the higher individual 
output entropy values H(y,) arising from the triangular 
probability distribution functions of (Xj+xJ and (x^Xj) of 
FIG. 3C which more closely match the sigmoid slope (not 43 
shown). This interferes with the minimization of mutual 
information I(y A> yi) because the individual entropy H(y,) 
increases offset or mass: undesircd increases in mutual 
information to provide the higher joint entropy Hiy^y^ 
sought by the process. # 

The inventor believes that such interference has little 
significant effect in most practical situations, however. As 
mentioned above in connection with Eqn. 8, the sigmoidal 
function is not limited to (he usual two functions and indeed 
can be tailored to the particular class of probability distri- 53 
bution functions expected by the process of this invention. 
Any function that is ■ member of the class of solutions to the 
partial differentia] Eqn. 8 provides a sigmoidal function 
suitable for use with the process of this invention. It can be 
shown that this general class of sigmoidal functions leads to ^ 
the following two learning rules according to this invention: 



and where parameter r is chosen appropriately for the 
presumed kurtosis of the probability distribution function of 
the source signals {SJ. This formalism can be extended to 
covered skewed and multimodal input distribution by 
extending Eqn. 8 to produce an increasingly complex poly- 
nomial in g(x) such that 



Even with the usual logistic transfer function (Eqn. 7) and 
the hyperbolic tangent function (r=2), it appears that the 
problem of individual entropy interference is limited to 
sub-Oaussian probability distribution functions having a 
kurtosis less man 3. Advantageously, many actual analog 
signals, including the speech signals used in the experimen- 
tal verification of the system of this invention, are super* 
Gaussian in distribution. They have longer tails and are more 
sharply peaked than the Gaussian distribution, as may be 
appreciated with reference to the three distribution functions 
shown in FIGS. 8A-8C. FIG. 8A shows a typical speech 
probability distribution function, FIG. 8B shows the prob- 
ability distribution function for rock music and FIG. 8C 
shows a typical Gaussian white noise distribution. The 
inventor has found that joint entropy mail mi ration for 
sigmoidal networks always minimi 7^ the mutual informa- 
tion between the network outputs for all super-Gaussian 
signal distributions tested. Special sigmoid functions can be 
selected chat are suitable for accomplishing the same result 
for sub-Gaussian signal distributions as well, although the 
precise learning rules must be selected in accordance with 
the parametric learning rules of Eqns. 31-32. 

Different sigmoid non-linearities provide different anti- 
Hcbbian terms. Table 1 provides the anti-Hebbian terms 
from the learning rales resulting from several interesting 
non-linear transformation functions. The information- 
maximization rule consists of an anti-redundancy term 
which always has a form of f [W] 7 }** 1 and an anti-Hebbian 
term that keeps the unit from saturating. 

TABLE 1 



IBso.311 



(E4L.32] « 



Function: 


Slope: 


Ami HcM> torn: 


K = sH) 


. ** 
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Eqn. 8 
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" i*w* 
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Table 1 shows that only the Eqn. 8 solutions (including 
the hyperbolic tangent function for r=2) and the logistic 
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transfer functions produce anti-Hebblan terms that can yield Mj 

higher-order statistics. The other functions use the net input wo« f ci/X<»-« ( £ iM#-C-'*W 

u, as the output variable rather using the actual transformed ^ 7 

output Teats performed by the inventor show that the erf ^ sdccted sigdoidai transfer function, 

function is unsuitable for blind separation. In fact, stable 5 hype rboUc tangent function is selected as fte sigmoi- 

weight matrices using the -2x/i, can t>ecaiculatcd from the Don l£nUrity, the following training rules are used in the 

covariance matrix of the inputs alone The Icanungrule for ™*^^tvcn*on: 

a Gaussian radial basis function node is interesting because 5yacm 01 

it contains u, in both foe numerator and deiwminaror. The v fEqn. 34] 

denominator term limits the usefulness of such a rule awu.««-( TfBH ) 

because data points near toe radial basis function center \ ««w / 

would cause instability. Radial transfer functions are gen- e «. ( _ 1X^,7,) wtan <> l (Bp 351 

J^SuS?? K.S'SS 5S5~ » l6 ^O ra 7: each of the three input signals ,XJ contain 

snmit ctonau rx V rmresents "sensor" output signals such as well as an unknown mixture or up 10 uurc uhjuiwwu ^ 

upww / YgT\ „r^.tL\ r^auiflHv acr«dinc to the mine: elements exemplified by summing circuit lb. rune 

^^StSStS. S2 of t Snuins the .cad ^hU for .he 16 «u£ = 

leaning r^ ^ cH mjdated reeularly accord- formed by the network. Preliminary experiments performed 

^£££&%&fi££%L These z, by the inventor whh V-Up* J. -hi* 

^« laT^ai «ft« evw signal sample or may be simultaneously separated and deconvolved using the leara- 

o^ml? £2 SteTupd-ing in a ing rule discussed above resulted in recovery of apptfently 

global mode. Each of the weight elements in FIG. 5 exem- perfect speech. 

plificd by element IS includes the logic necessary to produce Experimental Results 

KeLltote the AW update adding to the applicable 30 The inventor conducted events usbglhrec^d 

,l^iZrZT segments of speech recorded from various speakers with 

^KSmtton network in FIG. 5 can also be used to only one speaker per recording All sp^h segments w« 

JoveKo^g^ from a receive signal merely by, sampled st 8jD00 Hz from the oujput of 

£^Lde. iioUtialthe intofeter as output signal U, and microphone ofaSparc-10 

ften1u^a^gU,^memceivesignalofinter«t,soch 33 processing wu Performed on the 

Z tLlvt signal X, In such a configuration, the network normalization of amphwdea to a co^on taervsU-33] m 

sLrwnto HG 5 is herein denominated a -interference permit operation with the equipment used. Thenrtwork was 

£n^r»«w«t Saed using the stochastic gradient ascent procedure 0/ to 

FIG. 6 shows a functional block diagram illustrating a invention. mav nreceed 

dmple causal fUter operated according to the method of this 40 Unsupervised leaning to • ' n ^°^**" y 

taxation for blind ^convolution. A time-varying signal Is either continuously or in a global mode. CooUnuous learning 

to ieTeS Jtopu, 2Z T^fivHplaSups consisu in slightly modifying the 

rr> arc sewrated by a time-delay interval t in the manner gation of an input vector through the network. This kind of 

iiroS« to~i«™l filters. The five weight learning is useful for signals that arrive > inrealttw orwhen 
fectorflW are estabUshed and updated by internal logic 45 local storage capacity is restricted. In a globd learning 

f^Jwri 'J££»£ tamingrules shown in Eqns. mode, a multiplicity of samples .re propsgatod through the 

& SiS e Csve wtighted tap signals {U ( } network and the results stored locally. StatisUcs are oora- 

£ prolcefce stogie puted exactly on these data and the .weigh* j«i raodffled 

S^X output signal U, Because input signal X, only after accumulating and processing the mulophaty of 
Includes an unknown non-linear combination of time- 30 signal samples. 

ddayed versions of an unknown source signal S, the system To reduce computational oveAead. th«e 

^7mvenaon adjusts the tap weights {W,} such that were performed using the global learning mode. To ensure 

^wt sknTurapZdmates ^unknown source signal S r that the input ensemble Is stationary in time, random point 

nG^ahowstZS block diagram aiustratfW«he were selected from the three-second window to generate the 
cr^na^TblinT^ sepamti^ncrworklJbltod 33 »P^^^ ^ slc ^^rT^ 

d^nWution filter systems oftius invention. The blind with 0.005 jnefened. to u»e4 hatmn. learning rate, e estab- 

e^ation learning rules and the Mind deconvolution rules lishes the actual weight adjustment such that W^-W^ 

dtSu^l ab^a be easily combined to the form exem- ^ as is known to the art Jhe .nventoM^d tm 

plified by HQ. 7. The objective ii to maximize the natural reducing the learning rate over the learning process was 

loBarithm of a Jacobian with local lower triangular structure, «o useful. N 

wfiftTwdds die expected learning rule that forces the Blind Separation tatol-MMd 

Teadtog we^its {W^toelhers to follow the blind shown in FIGS. 2A and 5 togetita^th the ■ 

Son rules and afi others to follow a decorrelalion rule Eqns. 17-18 were found to be aufficient toperf tonbUnd 

IToSmu;^ ,0 produce a set of training ally to toTtof .1 [-11). Tbe m«tog matrix [A] w« u«d 

signl ^ven by 33: to l"* 0 * tic S£VGn,) Ume ,eneS fr0m * £ 
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original sources [S,J. The unmixing matrix (W] and the bias 
vector [WJ were then trained according to the rules in Eons. 
17-18. 

FIG. 16 showi the results of the attempted separation of 
five source signals. The mixtures IX,] formed an incompre- 
hensible babble thai could not be penetrated by the human 
ear. The unmixed solutions shown as [Y ( ] were obtained 
after presenting about 500.000 time samples, equivalent to 
20 pisses through the complete three-second series. Any 
residual interference in the output vector elements (YJ is 
inaudible to the human ear. This can be appreciated with 
reference to the permutation structure of the product of the 
final weight matrix [W) and the initial mixing matrix [A): 



MM* 



-4.09 
OJD7 
0X>2 
042 

-Oj07 



0.13 
-2.92 

-om 

OJ03 
0.14 



0.09 
0.00 

-CSX 
0.00 

-3.50 



-0J07 
Oj02 

-om 
\sn 

-O-Ol 



-0J01 
-OD6 
-2.20 
0.02 
OJH 



As can be seen, the residual interference factors are only 
a few percent of the single substantia] entry in each row and 
column, thereby demonstrating that weight matrix [W] 
substantially removes all effects of mixing matrix (A] from 
the signals. 25 

In a second experiment, seven source signals, Including 
five speaking voices, a rock music selection and white noise, 
were successfully separated, although the separation was 
sull slowly improving after 2 JS million iterations, equivalent 
to 100 passes through the three-second data. For two 30 
sources, convergence is normally achieved in less than one 
pass through the three seconds of data by the system of this 
invention. 

The blind separation procedure of this invention was 



The first whitening example shows what happens when 
"deconvolving** a speech signal that has not been corrupted 
(convolving filter [A] is a delta-function). If the tap spacing 
is close enough, as in this case where the tap spacing is 
s identical to the sample internal, the process of this invention 
learns the whitening Alter shown in FIG. UC that flattens the 
amplitude spectrum of the speech up to the Nyquist limit 
(equivalent to half of the sampling frequency). FIG. 9A 
shows the spectrum of the speech sequence before decon- 
io volution and FIG. 9B shows Che speech spectrum after 
deconvolution by the filter shown in FIG. 11C Whitened 
speech sounds like a clear sharp version of the original 
signal because the phase structure is preserved. By using all 
available frequency levels equally, the system Is maximizing 
15 information throughput in the channel. Thus, when the 
original signal is not white, the deconvolving filter of this 
invention will recover a whitened version of it rather than 
the exact original However, when the filter taps are spaced 
further apart, as in FIGS. 11E-11X. there is less opportunity 
20 for simple whitening. 

In the second '*barrcl-cffcct** example shown in FIG. 11E, 
a 6.25 ms echo is added to the speech signal This creates a 
mild audible barrel effect Because filter HE is finite in 
length, its inverse is infinite in length but is shown in FIG. 
25 UF as truncated. The inverting filter learned in FIG. 11G 
resembles FIG. IIP although the resemblance tails off 
toward the left side because the process of this invention 
actually learns an optimal filter of finite length instead of a 
truncated infinite optimal filter. The resulting deconvolution 
shown in FIG. 11H is very good. 

The best results from the blind deconvolution process of 
this invention are seen when the ideal deconvolving filter is 
of finite length, as in the third example shown in FIGS. 
11I-11L. FIG. Ill shows a set of exponentially-decaying 



me Oiina separation prowoure w uus> wr^uvu - :-, tl -**H tw a 

found to fail oniVwhto: (a)mc« than one unknown .ource » echoes spread out over 27Smi that *J™£* ■ 

«Gau S su«wWteno«e,ai.d(b)wh e nthennxlng tM lrixlA] two-pou,t Uta shown in HO. 11J wA ■ J^*«£o* 

I oeady singular. Both weakness are underflandable correct™ onto left, « "f»f.°f 

becaTno jSure can aeparate independent Gatasian J *e oonvohjtag fito Amnu 



unmixing matrix [W] i 

the expression in Bqn. 17 quite unstable in the vicinity of a 
solution. 

In contrast with these results, experience with similar tests 
of the HI network shows it occasionally fails to converge for 
two sources and rarely converges for three sources. 

Blind Deconvolution Results: Speech signals were con- 
volved with various filters and the learning rules in Eons. 
26-27 were used to perform blind deconvolution. Some 
results are shown in FIGS. 11A-11L. The convolving filter 
tune domains shown in FIGS. 11 A, UE and 111. contained so 
some zero values. For example, FIG. U£ represents the 
filter 10.8,0,0,0,11. Moreover, the taps were sometimes adja- 
cent to each other, as in FIGS. 11A-11D, and sometimes 
spaced apart in time, as in FIGS. 1U-1UL. The leading 
weight of each filter is the right-most bar in each histogram, 55 
exernplified by bar 30 in FIG. Ill and bar 32 in FIG. 11G. 

A whitening experiment is shown in FIGS. 11A-11D. a 
barrel-effect experiment in FIGS. 11E-11H and multiple- 

2K3 * *owo follow* by those of fceide* decoa- ^—^ffl M SlSS 
volving filter [W^, those of the filter produced by the 
process of this invention [W] and the time domain pattern 
produced by convolution of iWj and [A]. Ideally, the con- 
volution IWWAJ should be a delta-function consisting of 65 
only a single high value at the right-most position of the 
leading weight when (WJ correctly inverts [A], 



almost perfect. This result demonstrates the sensitivity of the 
blind processing method of this invention in cases where the 
tap-spacing is great enough (100 sample intervals) that 
simple whitening cannot interfere noticeably with the decon- 
45 volution process. 

Qearly, other embodiments and modifications of this 
invention may occur readily to those of ordinary skill in the 
art in view of these teachings. Therefore, this invention is to 
be limited only by the following claims, which include all 
such embodiments and modifications when viewed in con- 
junction with the above specification and accompanying 
drawing. 
I claim: 

1. A method performed in a neural network having input 
means for receiving a plurality J of input signals (X,) and 
output means for producing a plurality I of output signals 
each said output signal U, representing a combination of 
said input signals (X^ weighted by a plurality I of bias 
weights (W w ) and a plurality I 2 of scaling weights (W v ) such 



wherein 0<i£I>l and 0<j£J>l arc integers, said method 
comprising: 

(a) selecting initial values for said bias weights and 
said scaling weights (W v ); 

(b) producing a plurality I of training signals (Y,) respon- 
sive to a transformation of said input signals (X/) such 
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that Y,~g(U,), wherein g(x) is a nonlinear tofljtion and 
the Jacobian of said transformation is Ja^dY/dX,) 
when J=I; and 
(c) adjusting said bias weights (WJ and said scaling 
weights (W tf ) responsive to one or more samples of said 
training signals (YJ such that each said bias weight 
WIk, is changed proportionately to a corresponding bias 
measure AW a accumulated over said one or more 
samples and each said scaling weight W<, is changed 
proportionately lo a corresponding scaling measure 
AW^-oXlnlJiydW,, accumulated over said one or 
more samples, wherein e>0 is a learning rate. 
X The method of claim 1 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of the solutions to the equation 13 

and said AW^^^sgnTO) accumuUtcd ova said one » 
or more samples and each said scaling weight W„ is changed 
proportionaiely to a corresponding scaling measure AWy» 
e<(cof(W v ydet(W v )>rX^r l sgnCY,)) accumulated over 
said one ox more samples. 

3. The method of claim 1 wherein said nonlinear function 23 
g(x) is a nonlinear function selected from a group consisting 
essentially of gi(x>tanh(x) and g^H^T 1 and said 
AW„ selected from the group consisting essentially of 
AiWtf*(-2Y£ and A^-tO^Y,) accumulated over 



essentially of the solutions to the equation 

-^-aW-i-taW 

and said AW^-rX/f^^gnfY,)) accumulated over said 
one or more samples and each said scaling weight W„ is 
changed proportionately to a corresponding scaling measure 
AW^CcofCWvydetCW^K^nrr 1 *ga(Y,)) eccum^ 
lated over said one or more samples. 

6. The method of claim 4 wherein said nonlinear function 
Six) is a nonlinear function selected from a group consisting 
essentially of gl (x)*tanh(x) and g 2 (x)=<l-«^) 5al<1 
adjusting comprises: 

(c) adjusting said bias weights (W„) and said scaling 
1 weights (Wy) responsive to one or more samples of said 
Gaining signals (Y) such that each said bias weight W w 
is changed proportionately to a corresponding bias 
measure AW 0 selected from the group consisting 
essentially of W^l-lYd and AjW^d^Y,) 
accumulated over said one or more samples and each 
said scaling weight W v is changed proportionately to 
the a corresponding scaling measure AW„ selected 
from the group consisting essentially of A,W(^A-<(oof 
CW tf yto(W<,))-ZX,Y,) and A 2 W^e<(cof(W v )/det 
(W^X/l-lY,)) accumulated over said one or more 
samples. 

7. A method implemented in a transversal filter having an 
input for receiving a sensor signal X that includes a com- 



v — 7* ~~ . \ A ^,w„^*w binatica of multipara reverberations of a source signal 5 and 
said one or more samples and each said scaling weight W y 30 " olura j3y 1 0 f delay line tap output signals (T,) 

Is changed proportionately to the a corresponding scaling n<Vifl 8 v . ,^ 
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measure AW y selected from the group consisting essentially 
of A.W^acofCWvydet^W^aX^ and ^W^cof 
(W^yde^W^HX/l-ZYJ) accumulated over said one or 
more samples. 

4. A neural-network implemented method for recovering 
one or more of a plurality I of independent source signals 
(S,) from a plurality J>I of sensor signals (X,) each including 
a combination of at least some of said source signals (S,) 
wherein 0<i<I>l and 0<j*J>I arc integers* said method 40 
comprising! 

(a) selecting a plurality I of bias weights (W 0 ) and a 
plurality I* of scaling weights (W tf ); 

(b) adjusting said bias weight* (W w ) and said scaling 
weights (W„) by repeatedly performing the steps of: 45 
(b.l) producing a plurality I of estimation signals (U ( ) 

responsive to said sensor signals (X,) such that 

(U,H w </)(X,>+<Wio), 

(b.2) producing a plurality I of training signals (YJ 
responsive to a transformation of said sensor signals so 
(XJ such that Yr-gttJ,), wherein g(x) is a nonlinear 
function and the Jacobian of said transformation is 
l=dctOY/aX ; ) when M, and 

(b3) adjusting each said bias weight . and each said 
scaling weight W» responsive to one or more 55 
samples of said training signals (YJ such that said 
each bias weight W« is changed proportionately to a 
bias measure AW C accumulated over said one or 
more samples and said each scaling weight is 
changed proportionately to a corresponding scaling so 
measure AW^.cXlntII)/3W<, accu m u lated over said 
one or more samples, wherein e>0is a learning rate; 
and 

(c) producing said estimation signals (U<) to represent said 
one or more recovered source signals (SJ. * 5 

5. The method of claim 4 wherein said nonlinear function 
is a nonlinear function selected from a group consisting 



distributed at intervals of one or more rime delays t, said 
source signal S and said sensor signal X varying with time 
over a plurality JSI of said time delay intervals t such that 
said sensor signal X has a value X, at time t(H) *»d 
said delay line tap output signal T, has a value 
representing said sensor signal value X, delayed by a time 
interval t(i-l). wherein t>0 is a predetermined constant and 
(Xi^I>l and (kjSJSI arc integers, said method recovering 
said source signal S from said sensor signal X and compris- 
ing: 

(a) selecting a plurality I of filter weights (WJ; 

(b) adjusting said filter weights (W,) by repeatedly per- 
forming the steps of 

(b.l) producing a pluraUty K=I of weighted tap outpui 
signals (V^) by combining said delay One tap output 
signals (T<) such that (V^MF*) fT,), wherein 
0ck^?K=l>l arc integers, and wherein F^W^^ 
when l£k+l-i§I and F^=0 otherwise, 

(b.2) summing a plurality JC=I of said weighted tap 
signals (V*) lo produce an estimation signal 

wherein said estimation signal U has a value U, at time 

(bJ) producing a plurality J of training signals (Y y ) 
responsive to a transformation of said sensor signal 
values (X,) such that Yy=g(U^) wherein g(x) is a 
nonlinear function and the Jacobian of said transfor- 
mation is J=det(aY/3X / ) when J=L and 
(b.4) adjusting each said filter weight W, responsive to 
one or more samples of said training signals (Y y ) 
such that said each filter weight W, is changed 
proportionately to a corresponding leading measure 
AW accumulated over said one or more samples 
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when i=l and a corrcspooding scaling measure AW,= 
eo\lnUI)/3W, accumulated over said one or more 
samples otherwise; and 
(c) producing said estimation signal U to represent said 

recovered source signal S. 
8. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of g,(x)*tanh(x) and grPO^l-c^T 1 and said 
AW, selected from the group consisting essentially of 



20 



ac cumulat ed over said one or more samples when i=l and a 
corxespooding scaling measure AW, selected from the group 
consisting essentially of 

A,W,=« .i <-2JC^ir/)aodA3Wi=t'.£ X^-i(l-2r)) 



ac cumul ated over said one or more samples otherwise. 

9. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of the solutions to the equation 



- if*) » i - W^r uxi «id AWi i 



amirm'!*"** over said one or more samples when i=l and a 
coxresponding scaling measure 
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ac cumulated over said one or more samples otherwise, 

10. A neural network for recovering a plurality of source 
signals from a plurality of mixtures of said source signals, 
said neural network comprising: 

input means for receiving a plurality J of input signals (X y ) 
each including a combination of at least some of a 
plurality I of independent source signals (SJ, wherein 
(Ki£l>l and 0<J5J£I are integers; 

weight means coupled to said input means for storing a x 
plurality I of bias weights (W n ) and a plurality I 2 of 
scaling weights (W tf ); 

output means coupled to said weight means for producing 
a plurality I of output signals (UJ responsive to said 
input signals (X,) such that (U*HW V ) (X,yKW w ); 55 

training means coupled to said output means for produc- 
ing a plurality I of training signals (YJ responsive to a 
transformation of said input signals (X,) such that 

wherein g(x) is a nonlinear function and the Jacobian of said 60 

transformation is J-detfdY/aX,) when J»I; 
adjusting means coupled to said training means and said 
weight means for adjusting said bias weights (WJ and 
said scaling weights (W^) responsive to one or more 
samples of said training signals QQ such that each said 65 
bias weight W*, is changed proportionately to a corre- 
sponding bias measure AW M accumulated over said 



24 



10 



one or more samples and each said scaling weight W< ; 
is changed proportionately to a corresponding scaling 
measure AW^3(lnLJ1)/dWy accumulated over said 
one or more samples, wherein t>0 is a learning rate. 
11. The neural network of claim 10 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 

and said bias measure AW<f^-i1Y < r 1 sgn(Y,)) and said 
scaling measure AW^^cof^ydetCW^hrX^Y^ 1 sgn 
(Y|)). 

IX The neural network of claim 10 wherein said nonlin- 
ear function g(x) is a nonlinear function selected from a 
group consisting essentially of g,(x)=tanh(x) and g^xHl- 
c~~f l and said bias measure AW D is selected from a group 
consisting essentially of A t W ID s=>-2Y < and AjW^l^Y, and 
said scaling measure AW y is selected from a group consist- 
ing essentially of AW.W^cofCW^deu^tt-X^Y, and 
A^W^cofXW^/detCW^y+X/l^Y,). 

13. A system for adaptivcly cancelling one or more 
interferer signals (S J comprising: 

input means for receiving a plurality J of input signals (X,) 
each including a combination of at least some of a 
plurality I of independent source signals (S*) mat 
includes said one or more interferer signals (SJ. 
wherein 0<i£I>l, (KjSJSI and CXn*N£l are inte- 
gers; 

weight means coupled to said Input means for staring a 
plurality I of bias weights (W^) and a plurality I 2 of 
scaling weights (W v ); 

output means coupled to said weight means for producing 
a plurality I of output signals (U,) responsive to said 
input signals (X^) such that (T^WW^ (X^rKW*); 

training means coupled to said output means for produc- 
ing a plurality I of training signals (Y<) responsive to a 
transformation of said input signals (Xj) such that 
Yr=g(U|), wherein g(x) is a nonlinear function and the 
Jacobian of said transformation is' J«det(dY/dX^; 

adjusting means coupled to said training means and said 
weight means far adjusting said bias weights (Wjq) and 
said scaling weights (W„) responsive to one or more 
samples of said training signals (Yi) such that each said 
bias weight W*, is changed r»oportionately to a corre- 
sponding bias measure AW 0 accumulated over said 
one or more samples and each said scaling weight W v 
is changed proportionately to a corresponding scaling 
measure AW^-cXinlJiyaw^ accumulated oyer said 
one or more samples, wherein o0 is a learning rate; 
and 

feedback means coupled to said output means and said 
input means for selecting one or more said output 
signals (UJ representing said one or more interferer 
signals (SJ far combination with said input signals 
(Kj\ thereby cancelling said interferer signals (SJ. 

14. The system of claim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 

and said bias measure AW^-C-rtY^SgnCYi)) «*d *«"d 
scaling measure AW^<(cof(W tf ydet(W w )HX/i^ J sgn 
(YJ). 
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15 The system of daim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of g/x^tanhCx) and &(xMl-« 
and said bias measure AW„ is selected from a group 
consisting essentially of A x W J0 =>-2Y < and A 2 W W = 1-2Y, and 



said scaling measure AW» is selected from a group consist- 
ing essentially of A^^cofCW^ydeUW^hX/V, and 
AaW^cofCW^ydctOV^HX^l^Y,). 
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WHAT IS CLAIMED IS; 

1 . A medical system for separating electrocardiogram (EKG) signals, comprising: 

a receiving module configured to receive a plurality J of recorded EKG signals Xj 
from a plurality of EKG sensors; 
5 a computing module configured to separate the received signals using independent 

component analysis to produce a plurality I of separated signals Y t \ and 

a display module configured to display the separated signals. 

2. The medical system of claim 1, wherein the display module is further configured to display 
at least a portion of the separated signals in a chaos phase space portrait. 

10 3. The medical system of claim 2, wherein the separated signals include three components of 
QRS complex, and wherein the display module is further configured to display at least the three 
QRS complex components in a chaos phase space portrait. 

4. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals by multiplying the recorded signals by a matrix Wy such that = W u - * Xj. 
15 5. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals using a neural -network implemented method, the method comprising: 

. selecting a plurality I of bias weights W i0 and a plurality I* J of scaling weights Wy-; 
adjusting the bias weights W i0 and the scaling weights Wy to minimize information 
redundancy among separated signals; and 
20 producing separated signals Yj such that Yj = Wij * Xj + W io . 

6. The medical system of claim 1, further comprising a database storing a plurality of EKG 
signal triggers and corresponding diagnosis, and a matching module configured to match the 
separated signals with one or more of the stored EKG signal triggers. 

7. A computer-implemented method of separating electrocardiogram (EKG) recording signals, 
25 the method comprising: 

receiving a first plurality of EKG recording signals from EKG sensors placed on a 

patient; 

separating the first plurality of EKG recording signals using independent 
component analysis to produce a second plurality of separated signals; and 
30 displaying the separated signals. 

8. The method of claim 7, further comprising displaying at least a portion of the separated 
signals in a chaos phase space portrait. 

9. The method of claim 7, wherein the patient is a pregnant patient, and wherein the separated 
signals include separated signals originating from the pregnant patient and separated signals 

35 originating from a fetus. 

10. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of arrhythmia in the patient. 
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11. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of myocardial infarction in the patient. 

12. The method of claim 7, wherein each of the separated signals corresponds to a location on 
the patient body, wherein the displayed separated signals are used by a physician to determine the 

5 location of an abnormal heart condition in the patient according to the separated signals' 
corresponding locations. 

1 3. A computer-assisted method of detecting arrhythmia in a patient, the method comprising: 

placing a first plurality of EKG sensors on a patient to produce a first plurality of 
channels of recorded EKG signals; 
10 sending the recorded signals to a computing module to separate the first plurality of 

EKG recorded signals into a first plurality of channels of separated signals using 
independent component analysis; and 

reviewing a display of the separated signals to determine the existence of 
arrhythmia in the patient. 

15 14. The method of claim 13, wherein reviewing a display of the separated signals comprises 
identifying a second set of one or more channels of separated signals that indicate arrhythmia, the 
method further comprising determining a probable location of arrhythmia according to the 
respective channel numbers of the second set of separated signals. 

15. The method of claim 13, wherein placing a first plurality of EKG sensors comprises placing 
20 a plurality of EKG sensors on more than 10 body surface locations of a patient's torso. 

16. The method of claim 13, wherein placing a first plurality of EKG sensors comprises placing 
a plurality of EKG sensors on more than 40 body surface locations of a patient's torso. 

17. A cardiac rhythm management system comprising: 

a cardiac signal recording module configured to record cardiac signals of a patient; 
25 a computing module configured to separate the recorded cardiac signals into 

separated signals using independent component analysis; 

a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a treatment module configured to treat the patient when the abnormal condition is 
30 detected or predicted. 

18. The cardiac rhythm management system of claim 17, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 

19. A cardiac rhythm management system comprising: 

35 a cardiac signal recording module configured to record cardiac signals of a patient; 

a computing module configured to separate the recorded cardiac signals into 
separated signals using independent component analysis; 
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a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a warning module configured to issue a warning when the abnormal condition is 
detected or predicted. 

5 20. The cardiac rhythm management system of claim 19, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 
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